System, methods, device and apparatuses for preforming simultaneous localization and mapping

ABSTRACT

Embodiments of the present disclosure are directed to various systems, methods and apparatuses for performing simultaneous localization and mapping (SLAM) for a wearable device, including without limitation, a head-mounted wearable device that optionally includes a display screen. Such embodiments enable accurate and quick localization of a wearable device within a dynamically constructed map, optionally through computations performed with a computational device (including those having limited resources). A non-limiting example of such a computational device is a smart cellular phone or other mobile computational device.

FIELD OF THE DISCLOSURE

The present disclosure, in at least some embodiments, is directed tosystems, methods, and apparatuses for performing simultaneouslocalization and mapping (SLAM), and in particular, for such systems,methods, and apparatuses, for performing SLAM with/for a wearabledevice.

BACKGROUND

The term SLAM refers to “Simultaneous Localization And Mapping,” and wasinitially applied to problems of independent movement of a mobile robot(device). In some such systems, the location of the mobile device (e.g.,robot) is necessary—that is, its location on a map of an environment—asis a map the environment, so that the mobile device can determine itsrelative location within that environment. In some known systems,however, these tasks cannot be performed simultaneously, which resultsin substantial delays when processing mobile device locationinformation.

SLAM can be performed with sensor data from a number of different sensortypes. Visual SLAM refers to the use of visual data from a visualsensor, such as for example a camera, to perform the SLAM process. Insome cases, only such visual data is used for the SLAM process (see forexample Visual Simultaneous Localization and Mapping: A Survey,Artificial Intelligence Review 43(1)⋅November 2015).

Various types of sensors and the use of their data in the SLAM processare described in “Past, Present, and Future of Simultaneous LocalizationAnd Mapping: Towards the Robust-Perception Age”, Cadena et al,https://arxiv.org/pdf/1606.05830.pdf. This article also describes theimportance of the “pose”, or position and orientation, for the SLAMprocess. The pose relates to the position and orientation of the robotor other entity to which the sensor is attached, while the map describesthe environment for that robot.

Additionally, some known systems cannot dynamically determine the natureof the mobile device's environment, and therefore, cannot dynamicallydetermine navigation instructions, and/or other information. Forexample, in some known systems, a navigator for the mobile device caninput pre-determined environment data into the known system so as toprovide a description of the environment. Such known systems, however,cannot modify the description of the environment substantially inreal-time, based on new environmental information, and/or the like.

U.S. Pat. No. 9,367,811 describes a method for context awarelocalization, mapping, and tracking (CALMT). However this method doesnot feature simultaneous localization and mapping, such that it is lessuseful than SLAM. Furthermore the method is focused on computer vision,which is a more limited activity.

US Patent Application No. 2014/0125700 describes one method forperforming SLAM with sensor data, but is restricted to use in situationsthat have geometric constraints that are known a priori, which must beprovided to the SLAM system before it can begin operation.

U.S. Pat. No. 9,674,507 describes a monocular SLAM system that creates a3D map with panoramic and 6DOF camera movements. The system is limitedto generating maps only by creating keyframes first, analyzing features,and then potentially saving the keyframe as part of the map. The systemrequires that all tasks operate in a single process or provides thepossibility that all mapping tasks can be separated into another processwith all mapping tasks in that single process. The system does notprovide further flexibility given that mapping is done by keyframesonly.

U.S. Pat. No. 9,390,344 describes a SLAM system using various motionsensors and that maps and tracks simultaneously using threadedprocesses. The system described similarly uses keyframes only to createmaps.

Thus, a need exists for methods, apparatuses, and systems that candynamically determine the location of a mobile device, dynamicallydetermine the nature of the mobile device's environment, and canefficiently determine actions for the mobile device to take based on thedynamically-determined information. Methods and systems for mappingoperations without any a priori constraints are also needed.

SUMMARY OF SOME OF THE EMBODIMENTS

Embodiments of the present disclosure include, systems, methods andapparatuses for performing simultaneous localization and mapping (SLAM)which addressed the above-noted shortcomings.

In some embodiments, a SLAM system is provided for a wearable device,including without limitation, a head-mounted wearable device thatoptionally includes a display screen. Such systems, methods andapparatuses can be configured to accurately (and in some embodiments,quickly) localize a wearable device within a dynamically constructedmap, e.g., through computations performed with a computational device. Anon-limiting example of such a computational device is a smart cellularphone or other mobile computational device.

According to at least some embodiments, SLAM systems, methods andapparatuses can support a VR (virtual reality) application, an AR(augmented reality) application, and/or the like.

According to at least some embodiments, there is provided a wearableapparatus, comprising: a monocular optical sensor, a computationaldevice, and a simultaneous localization and mapping (SLAM) analyzeroperational on the computational device and configured for receivingoptical sensor data from said sensor, the SLAM analyzer comprising: alocalization module, and a fast mapping module configured to rapidlycreate said dynamically constructed a map from said sensor data; and amap refinement module to refine said dynamically constructed mapaccording to said sensor data; wherein said SLAM analyzer is configuredto localize the sensor according to said optical sensor data within adynamically constructed map according to a SLAM process; each of saidlocalization module, said fast mapping module and said map refinementmodule is configured to operate at a separate process speed of saidcomputational device; and said localization module localizes said sensorin said dynamically constructed map according to said sensor data.

Optionally said computational device comprises a mobile computationaldevice.

Optionally said computational device comprises a cellular phone.

Optionally the apparatus further comprises headgear for mounting saidapparatus to a user, wherein said cellular phone comprises said sensor,and said cellular phone is mounted on said headgear.

Optionally said computational device comprises a hardware processorconfigured to perform a predefined set of basic operations in responseto receiving a corresponding basic instruction selected from apredefined native instruction set of codes, and memory; said SLAManalyzer comprises: a first set of machine codes selected from thenative instruction set for receiving said optical sensor data, a secondset of machine codes selected from the native instruction set foroperating said localization module, a third set of machine codesselected from the native instruction set for operating said fast mappingmodule, and a fourth set of machine codes selected from the nativeinstruction set for operating said map refinement module; and each ofthe first, second, third and fourth sets of machine code is stored inthe memory.

Optionally said hardware processor operates said map refinement moduleat a process speed that is at least 50% slower than said fast mappingmodule.

Optionally said localization module comprises a tracking processor, saidtracking processor operates at a separate process speed from each of afast mapping processor and a map refinement processor; said processspeed of said tracking processor is at least five times faster than saidprocess speed of said fast mapping processor; and said trackingprocessor locates said sensor according to said sensor data andaccording to a last known position of said sensor on said map.

Optionally said tracking processor reduces jitter by spreading erroracross localizations.

Optionally said map refinement processor is configured to calibrate saidsensor according to a difference estimate between said map before andafter said map refinement processor refines said map.

Optionally said map refinement processor is configured to correct fordrift caused by said fast mapping processor.

Optionally said map refinement processor is configured to perform maprefinement by bundle adjustment.

Optionally the apparatus further comprises a sensor preprocessoroperated by said computational device, said sensor comprises a camera,said data comprises video data, and said sensor preprocessor furthercomprises a calibration module for calibrating said video data of saidcamera according to a calibration process.

Optionally said calibration process includes at least one of determininglens distortion and focal length.

Optionally said calibration module is configured to calibrate saidcamera according to a model of said camera and/or of said cellularphone.

Optionally said sensor preprocessor comprises a sensor abstractioninterface for abstracting data from said sensor.

Optionally said sensor comprises a camera, said data comprises videodata, said localization module is configured to reduce jitter whiledetermining a location a plurality of times according to at least one ofmaintaining a constant error, mixing frame-to-frame withkeyframe-to-frame tracking, applying a Kalman filter, and a combinationthereof.

Optionally said sensor comprises a camera, said data comprises videodata, the apparatus further comprising a sensor preprocessor operated bysaid computational device, and said sensor preprocessor furthercomprises a sensor data preprocessor configured for converting saidvideo data to grayscale if necessary and then applying a Gaussianpyramid to said grayscale video data.

Optionally said SLAM analyzer is configured to localize the sensor onlyaccording to said optical sensor data.

Optionally said optical sensor data comprises video data and whereinsaid SLAM analyzer is configured to perform an initialization processcomprising: a keypoints reference frame detection process configured toselect an image as a reference frame; a keypoints detection processconfigured for detecting a plurality of keypoints on the referenceframe; and an initial map creation process configured for creating aninitial map from said keypoints.

Optionally said keypoints detection process comprises a LK(Lucas-Kanade) process.

Optionally said initialization process further comprises a verificationprocess configured to verify validity of the tracked points.

Optionally said verification process comprises a NCC (Normalized CrossCorrelation) process.

Optionally said initialization process further comprises a posecalculation process configured to calculate a pose of said opticalsensor before said initial map creation process creates said initialmap.

Optionally said pose calculation process comprises applying homographyand/or an essential matrix to said keypoints to determine the pose.

Optionally said pose calculation process comprises applying saidhomography and said essential matrix, and determining which of saidapplying provides a more accurate result.

Optionally said pose calculation process comprises applying saidhomography first to determine if a sufficiently accurate result isobtained; if said sufficiently accurate result is not obtained, applyingsaid essential matrix.

Optionally said pose calculation process comprises a RANSAC process.

Optionally said pose calculation process further comprises estimatingsaid essential matrix according to a process selected from the groupconsisting of GOODSAC and RANSAC.

Optionally said map is generated without a priori constraint.

Optionally said map is generated de novo.

Optionally the apparatus further comprises at least one of anaccelerometer, a gyroscope, a magnetometer, a barometric pressuresensor, a GPS (global positioning system) sensor, a microphone or otheraudio sensor, a proximity sensor, a temperature sensor, a UV(ultraviolet light) sensor, a depth sensor, and an IMU (inertialmeasurement unit).

Optionally said IMU comprises an accelerometer and a gyroscope.

Optionally said IMU further comprises a magnetometer.

Optionally during an initialization of said SLAM analyzer, said opticalsensor data and said IMU data are interpolated according to a time basedinterpolation method, followed by initial bundle adjustment ofinterpolated data.

Optionally said SLAM analyzer is additionally configured to determinedisplacement of at least said optical sensor according to a combinationof translation of said optical sensor and rotation of said IMU.

Optionally said SLAM analyzer is additionally configured to integraterotation of said IMU of a first pose to determine a second pose of saidoptical sensor.

Optionally said SLAM analyzer is further configured to operate a loopclosure process, and update said map according to said second pose,followed by performing said loop closure process.

Optionally said optical sensor comprises a camera selected from thegroup consisting of RGB camera, color camera, grayscale camera, infraredcamera, a charged coupled device (CCD), and a CMOS sensor.

Optionally said SLAM analyzer is additionally configured to perform aSLAM process comprising: selecting a plurality of keyframes of saidvideo data; determining a plurality of features of each of saidkeyframes; warping a plurality of patches of said keyframes around saidplurality of features; performing a NCC (normalized cross-correlation)process on said warped keyframe patches; and determining a location ofsaid optical sensor according to said NCC process.

Optionally said SLAM process further comprises determining adisplacement estimate from a previous known location of said opticalsensor, and said determining said location of said optical sensoraccording to said NCC process comprises applying a result of said NCCprocess to said displacement estimate.

Optionally said selecting said plurality of keyframes of said video datafurther comprises selecting a plurality of keyframes from saiddynamically constructed map according to a plurality of feature pointson said dynamically constructed map.

Optionally said SLAM process further comprises reducing jitter bymaintaining a consistent error across analysis of a plurality of frames.

Optionally said SLAM process further comprises: determining ifrelocalization of said optical sensor is required according to adetermination of reliability of said location of said optical sensor,and if so, performing relocalization by comparing a plurality offeatures of said keyframes to determine said previous known location ofsaid optical sensor, performing said selecting a plurality of keyframesof said video data; determining a plurality of features of each of saidkeyframes; warping a plurality of patches of said keyframes around saidplurality of features; performing a NCC (normalized cross-correlation)process on said warped keyframe patches; and determining a location ofsaid optical sensor according to said NCC process.

Optionally said comparing said plurality of features of said keyframescomprises: determining a descriptor for each feature; sorting saiddescriptors for similarity; sorting said keyframes according to similardescriptors; and comparing said sorted descriptors to a plurality ofknown landmarks on said dynamically constructed map appearing on saidsorted keyframes.

Optionally said sorting said descriptors for similarity is performedwith a vocabulary tree.

Optionally said comparing said sorted descriptors to a plurality ofknown landmarks on said dynamically constructed map appearing on saidsorted keyframes further comprises removing outliers and determiningsaid previous known location.

Optionally said removing outliers and determining said previous knownlocation is performed according to RANSAC.

Optionally said determining said location comprises: searching for aknown landmark on a plurality of selected keyframes; if said knownlandmark is not found on said plurality of selected keyframes,determining said known landmark to be invalid; and if said knownlandmark is found on at least one of said plurality of selectedkeyframes, determining said known landmark to be validated.

Optionally said SLAM analyzer further comprises a map collaborationprocessor configured for communicating map information to and receivingmap information from at least one additional SLAM analyzer external tothe apparatus.

Optionally said SLAM analyzer further comprises a map changes processor,and said map changes processor is configured to detect a change in theenvironment represented by said map.

Optionally the apparatus further comprises an object application, whichmay also be termed herein an outside application, operated by saidcomputational device and configured for manipulating, locating orrepresenting an object, wherein said map changes processor is configuredto inform said object application that: a particular object has beenmoved, a particular object has disappeared from its last known location,or a new specific object has appeared. The object as described herein isa physical object in the physical world, but mapped onto the map asdescribed herein.

Optionally said object application comprises a VR (virtual reality)application or an AR (augmented reality) application.

Optionally said object application is an AR application, said SLAManalyzer further comprising a real object locator, said real objectlocator is configured to determine a location and geometry of a physicalobject in an environment external to the apparatus, and provides saidlocation and geometry to said AR application.

Optionally the apparatus further comprises a housing for housing saidoptical sensor.

Optionally said housing further houses said computational device.

Optionally said computational device is located separately from saidhousing.

Optionally said computational device is located remotely from saidhousing.

According to at least some embodiments, there is provided a wearableapparatus, comprising: a sensor; a computational device; and asimultaneous localization and mapping (SLAM) analyzer configured forreceiving data from said sensor and for being operated by saidcomputational device, wherein: said SLAM analyzer is configured tolocalize the apparatus according to said sensor data within adynamically constructed map according to a SLAM process; said sensorcomprises a camera, said data comprises video data from said camera,said SLAM process comprises: selecting a plurality of keyframes of saidvideo data; determining a plurality of features of each of saidkeyframes; warping a plurality of patches of said keyframes around saidplurality of features; performing a NCC (normalized cross-correlation)process on said warped keyframe patches; and determining a location ofsaid wearable device according to said NCC process.

Optionally said SLAM process further comprises determining adisplacement estimate from a previous known location of said wearabledevice, and said determining said location of said wearable deviceaccording to said NCC process comprises applying a result of said NCCprocess to said displacement estimate.

Optionally said selecting said plurality of keyframes of said video datafurther comprises selecting a plurality of keyframes from saiddynamically constructed map according to a plurality of feature pointson said dynamically constructed map.

Optionally said SLAM process further comprises reducing jitter bymaintaining a consistent error across analysis of a plurality of frames.

Optionally said SLAM process further comprises: determining ifrelocalization of said wearable device is required according to adetermination of reliability of said location of said wearable device,and if so, performing relocalization by comparing a plurality offeatures of said keyframes to determine said previous known location ofsaid wearable device, performing said selecting a plurality of keyframesof said video data; determining a plurality of features of each of saidkeyframes; warping a plurality of patches of said keyframes around saidplurality of features; performing a NCC (normalized cross-correlation)process on said warped keyframe patches; and determining a location ofsaid wearable device according to said NCC process.

Optionally said comparing said plurality of features of said keyframescomprises: determining a descriptor for each feature; sorting saiddescriptors for similarity; sorting said keyframes according to similardescriptors; and comparing said sorted descriptors to a plurality ofknown landmarks on said dynamically constructed map appearing on saidsorted keyframes.

Optionally said sorting said descriptors for similarity is performedwith a vocabulary tree.

Optionally said comparing said sorted descriptors to a plurality ofknown landmarks on said dynamically constructed map appearing on saidsorted keyframes further comprises removing outliers and determiningsaid previous known location.

Optionally said removing outliers and determining said previous knownlocation is performed according to RANSAC.

Optionally said determining said location comprises: searching for aknown landmark on a plurality of selected keyframes; if said knownlandmark is not found on said plurality of selected keyframes,determining said known landmark to be invalid; and if said knownlandmark is found on at least one of said plurality of selectedkeyframes, determining said known landmark to be validated.

Optionally the apparatus further comprises an AR (augmented reality)application, wherein: said SLAM analyzer further comprises an obstacleavoidance processor; said obstacle avoidance processor is configured todetermine a location and geometry of each validated landmark that is apotential obstacle and communicates said location and geometry to saidAR application.

Optionally the apparatus further comprises an VR (virtual reality)application, wherein said SLAM analyzer further comprises an obstacleavoidance processor configured to determine a location and geometry ofeach validated landmark that is a potential obstacle and communicatessaid location and geometry to said VR application.

Optionally said sensor comprises a plurality of cameras and wherein saidvideo data is analyzed at least as stereo image data.

Optionally the apparatus further comprises an IMU, wherein said SLAManalyzer is further configured to analyze said IMU data for said SLAMprocess.

Optionally said SLAM analyzer is further configured to interpolate saidoptical data and said IMU data by said SLAM process.

Optionally said SLAM process is configured to interpolate said opticalsensor data and said IMU data, and calculate a quaternion interpolationof said optical sensor data and said IMU data.

Optionally said SLAM process further comprises determining aninitialization error for said IMU, and for weighting said quaternioninterpolation according to said initialization error.

Optionally said quaternion interpolation comprises a weighted SLERPinterpolation.

Optionally said IMU comprises a magnetometer, said apparatus furthercomprises a magnetometer separate from said IMU or a combinationthereof, and wherein said SLAM process further comprises determiningtranslation of said magnetometer according to magnetometer data, andapplying said translation to said interpolated optical sensor data andIMU data.

According to some embodiments there is provided a SLAM apparatusconfigured for performing simultaneous localization and mapping (SLAM)process, comprising: a computational device, a SLAM analyzer operated byor operational on said computational device, an optical sensor incommunication with said computational device, an IMU in communicationwith said computational device, wherein said IMU comprises anaccelerometer and a gyroscope; and a structure for causing said opticalsensor and said IMU to move in tandem; wherein: said computationaldevice is configured to receive sensor data from said optical sensor andfrom said IMU for being analyzed by said SLAM analyzer; and said SLAManalyzer is configured to perform a SLAM process to create a map and tolocalize one or both of said optical sensor and said IMU in said mapaccording to said optical sensor data and said IMU data, according to atime based localization method.

Optionally said structure comprises a housing for housing said opticalsensor and said IMU.

Optionally said housing further houses said computational device.

Optionally said SLAM process further comprises performing initial bundleadjustment according to a spline, wherein said spline is determinedaccording to said optical sensor data and said IMU data, and wherein asecond derivative of said spline is determined according toaccelerometer data.

Optionally said IMU comprises a magnetometer, said apparatus furthercomprises a magnetometer separate from said IMU or a combinationthereof, and wherein said SLAM process further comprises determiningtranslation of said magnetometer according to magnetometer data; whereinsaid SLAM process further comprises applying said translation to saidinterpolated optical sensor data and IMU data.

According to at least some embodiments, there is provided a SLAM methodconfigured for performing SLAM for a wearable apparatus comprising asensor, a computational device, and a simultaneous localization andmapping (SLAM) analyzer operated by the computational device, the methodcomprising: receiving sensor data from said sensor by said SLAManalyzer; performing a SLAM process by said SLAM analyzer, said SLAMprocess comprising: simultaneously dynamically constructing a map andlocating the apparatus according to said sensor data within saiddynamically constructed map, wherein said SLAM process is adapted to beperformed by said limited resources of said computational device;wherein: said performing said SLAM process comprises: performing a fastmapping process to rapidly create said dynamically constructed map fromsaid sensor data; performing a localization process to localize saidwearable device in said dynamically constructed map according to saidsensor data; and performing a map refinement process to refine saiddynamically constructed map according to said sensor data, each of saidfast mapping process and said map refinement process is operated at aseparate process speed of said computational device, and said maprefinement process operates at a process speed that is at least 50%slower than a process speed of said fast mapping process so as to adaptsaid SLAM process to be performed by said computational device.

According to at least some embodiments, there is provided a SLAM methodfor performing SLAM for a wearable apparatus comprising a sensor and acomputational device, wherein said sensor comprises a camera providingvideo data; the method comprising: receiving video data from said cameraby said computational device; simultaneously dynamically constructing amap and locating the apparatus according to said video data within saiddynamically constructed map, by selecting a plurality of keyframes ofsaid video data; determining a plurality of features of each of saidkeyframes; warping a plurality of patches of said keyframes around saidplurality of features; performing NCC (normalized cross-correlation)process on said warped keyframe patches; and determining a location ofsaid wearable device on said dynamically constructed map according tosaid NCC process.

Optionally the method further comprises adding IMU data for a moreefficient and/or accurate SLAM process.

Unless otherwise defined, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this disclosure belongs. The materials, methods, andexamples provided herein are illustrative only and not intended to belimiting.

Various embodiments of the methods, systems and apparatuses of thepresent disclosure can be implemented by hardware and/or by software ora combination thereof. For example, as hardware, selected steps ofmethodology according to some embodiments can be implemented as a chipand/or a circuit. As software, selected steps of the methodology (e.g.,according to some embodiments of the disclosure) can be implemented as aplurality of software instructions being executed by a computer (e.g.,using any suitable operating system). Accordingly, in some embodiments,selected steps of methods, systems and/or apparatuses of the presentdisclosure can be performed by a processor (e.g., executing anapplication and/or a plurality of instructions).

Although embodiments of the present disclosure are described with regardto a “computer”, and/or with respect to a “computer network,” it shouldbe noted that optionally any device featuring a processor and theability to execute one or more instructions is within the scope of thedisclosure, such as may be referred to herein as simply a computer or acomputational device and which includes (but not limited to) any type ofpersonal computer (PC), a server, a cellular telephone, an IP telephone,a smartphone, a PDA (personal digital assistant), a thin client, amobile communication device, a smartwatch, head mounted display or otherwearable that is able to communicate wired or wirelessly with a local orremote device. To this end, any two or more of such devices incommunication with each other may comprise a “computer network.”

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the disclosure is herein described, by way of exampleonly, with reference to the accompanying drawings. With specificreference now to the drawings in detail, it is stressed that particularsshown are by way of example and for purposes of illustrative discussionof the various embodiments of the present disclosure only, and arepresented in order to provide what is believed to be a useful andreadily understood description of the principles and conceptual aspectsof the various embodiments of inventions disclosed therein.

FIG. 1A shows schematic of a non-limiting example of a SLAM system,according to at least some embodiments;

FIG. 1B shows a schematic of a non-limiting example of a wearabledevice, according to at least some embodiments;

FIG. 1C shows a schematic of a non-limiting example of a combination ofa wearable device and a computational device, according to at least someembodiments;

FIG. 1D shows another schematic of a non-limiting example of acombination of a wearable device, a local data processing system, and aremote data processing system, according to at least some embodiments;

FIG. 2A shows a schematic of a non-limiting example of sensorpreprocessor according to at least some embodiments;

FIG. 2B shows a schematic of a non-limiting example of a SLAM analyzeraccording to at least some embodiments;

FIG. 2C shows a schematic of a non-limiting example of a mapping moduleaccording to at least some embodiments;

FIG. 3A shows a schematic of another non-limiting example of a systemaccording to at least some embodiments;

FIG. 3B shows a schematic of a non-limiting example implementation of acomputational device operating at least some components of the systemaccording to at least some embodiments;

FIG. 3C shows a schematic of another non-limiting example implementationof a computational device operating at least some components of thesystem according to at least some embodiments;

FIG. 4 shows a non-limiting exemplary method for performing SLAMaccording to at least some embodiments;

FIG. 5 shows a non-limiting exemplary method for performing localizationaccording to at least some embodiments;

FIG. 6 shows another non-limiting example of a method for performinglocalization according to at least some embodiments;

FIG. 7 shows a non-limiting example of a method for updating system mapsaccording to map refinement, according to at least some embodiments ofthe present invention; and

FIG. 8 shows a non-limiting example of a method for validating landmarksaccording to at least some embodiments of the present invention.

FIGS. 9A and 9B are example logic flow diagrams illustrating theperformance of actions in a VR environment, according to at least someembodiments;

FIGS. 10A and 10B are example logic flow diagram illustrating theperformance of actions in an AR environment, according to at least someembodiments;

FIG. 11 shows an exemplary, non-limiting flow diagram for performingSLAM according to at least some embodiments;

FIGS. 12A-12D show a detailed, exemplary, non-limiting flow diagram forperforming SLAM according to at least some embodiments;

FIG. 13A shows a schematic graph of accelerometer data;

FIG. 13B shows an exemplary, non-limiting flow diagram for determiningthe coordinates scale and gravity vector from IMU (Inertial MeasurementUnit) data according to at least some embodiments;

FIG. 13C shows an exemplary, non-limiting flow diagram for poseprediction according to at least some embodiments;

FIG. 14 shows an exemplary, non-limiting system for visual-inertial SLAMwith IMU (inertial measurement unit) data according to at least someembodiments;

FIG. 15A shows an exemplary, non-limiting flow diagram for SLAMinitialization according to at least some embodiments;

FIG. 15B shows an exemplary, non-limiting flow diagram for initialbundle adjustment with IMU data according to at least some embodiments;

FIG. 16 shows an exemplary, non-limiting flow diagram for SLAMinitialization with interpolation of IMU data according to at least someembodiments;

FIG. 17A shows an exemplary, non-limiting flow diagram for determining akey moment according to at least some embodiments; and

FIG. 17B shows an exemplary, non-limiting schematic diagram of a splinewith a plurality of key moments and key frames.

DETAILED DESCRIPTION OF SOME OF THE EMBODIMENTS

FIG. 1A shows a schematic of a non-limiting example of a simultaneouslocalization and mapping (SLAM) system, according to at least someembodiments of the present disclosure. In some implementations, SLAMsystem 100 can include at least one computational device/computer 107(as indicated earlier, the terms/phrases of computer, processor andcomputation device can be used interchangeably in the presentdisclosure), a wearable device 105, and one or more sensors 103. Thecomputational device 107 can include a sensor preprocessor 102 and aSLAM analyzer 104, and can be operatively coupled to the wearable device105 (e.g., wired or wirelessly), and can be included in the wearabledevice 105, and/or some combination thereof. Sensor preprocessor 102 andSLAM analyzer 104 can be separate processors in and of themselves in thecomputational device, or, may be software modules (e.g., an applicationprogram and/or a set of computer instructions for performing SLAMfunctionality operational on one or more processors). In someimplementations, the computational device 107 can be configured toreceive signal data (e.g., from the wearable device 105), to preprocessthe signal data, so as to determine movement of the wearable device, andto instruct the wearable device to perform one or more actions based onthe movement of the wearable device. Specifically, in someimplementations, sensor preprocessor 102 can receive the sensor datafrom the wearable device 105, and can perform preprocessing on thesensor data. For example, sensor preprocessor 102 can generateabstracted sensor data based on the sensor data.

SLAM analyzer 104 is configured to operate a SLAM process so as todetermine a location of wearable device 105 within a computationaldevice-generated map, as well as being configured to determine a map ofthe environment surrounding wearable device 105. For example, the SLAMprocess can be used to translate movement of the user's head and/or bodywhen wearing the wearable device (e.g., on the user's head or body). Awearable that is worn on the user's head would for example providemovement information with regard to turning the head from side to side,or up and down, and/or moving the body in a variety of different ways.Such movement information is needed for SLAM to be performed.

In some implementations, because the preprocessed sensor data isabstracted from the specific sensors, the SLAM analyzer 104, therefore,can be sensor-agnostic, and can perform various actions withoutknowledge of the particular sensors from which the sensor data wasderived.

As a non-limiting example, if sensor 103 is a camera (e.g., digitalcamera including a resolution, for example, of 640×480 and greater, atany frame rate including, for example 60 fps), then movement informationmay be determined by SLAM analyzer 104 according to a plurality ofimages from the camera. For such an example, sensor preprocessor 102preprocesses the images before SLAM analyzer 104 performed the analysis(which may include, for example, converting images to grayscale). Next aGaussian pyramid may be computed for one or more images, which is alsoknown as a MIPMAP (multum in parvo map), in which the pyramid startswith a full resolution image, and the image is operated on multipletimes, such that each time, the image is half the size and half theresolution of the previous operation. SLAM analyzer 104 may perform awide variety of different variations on the SLAM process, including oneor more of, but not limited to, PTAM (Parallel Tracking and Mapping), asdescribed for example in “Parallel Tracking and Mapping on a CameraPhone” by Klein and Murray, 2009 (available fromieeexplore.ieee.org/document/5336495/); DSO (Direct Sparse Odometry), asdescribed for example in “Direct Sparse Odometry” by Engel et al, 2016(available from https://arxiv.org/abs/1607.02565); or any other suitableSLAM method, including those as described herein.

In some implementations, the wearable device 105 can be operativelycoupled to the one or more sensor(s) 103 and the computational device107 (e.g., wired, wirelessly). The wearable device 105 can be a device(such as an augmented reality (AR) and/or virtual reality (VR) headset,and/or the like) configured to receive sensor data, so as to track auser's movement when the user is wearing the wearable device 105. Thewearable device 105 can be configured to send sensor data from the oneor more sensors 103 to the computational device 107, such that thecomputational device 107 can process the sensor data to identify and/orcontextualize the detected user movement.

In some implementations, the one or more sensors 103 can be included inwearable device 105 and/or separate from wearable device 105. A sensor105 can be one of a camera (as indicated above), an accelerometer, agyroscope, a magnetometer, a barometric pressure sensor, a GPS (globalpositioning system) sensor, a microphone or other audio sensor, aproximity sensor, a temperature sensor, a UV (ultraviolet light) sensor,an IMU (inertial measurement unit), and/or other sensors. If implementedas a camera, sensor 103 can be one of an RGB, color, grayscale orinfrared camera, a charged coupled device (CCD), a CMOS sensor, a depthsensor, and/or the like. If implemented as an IMU, sensor 103 can be anaccelerometer, a gyroscope, a magnetometer, a combination of two or moreof same, and/or the like. When multiple sensors 103 are operativelycoupled to and/or included in the wearable device 105, the sensors 103can include one or more of the aforementioned types of sensors.

FIG. 1B shows a schematic of a non-limiting example of a wearable device105 according to at least some embodiments. For example, in someimplementations, a wearable device 105 can include a processor 130, acommunicator 132, a memory 134, a display 136, a clock 142, and a powersupply 138, and/or a number of sensors 103. In some implementations,each of the communicator 132, the memory 134, the display 136, the clock142, and the power supply 138 can be operatively coupled to theprocessor 130. In implementations where the sensors 103 are operativelycoupled to the wearable device 105, the sensors can be operativelycoupled to the processor 130 (e.g., via the communicator 132); inimplementations where the sensors are included in the wearable device105, the sensors can be directly and operatively coupled to theprocessor 130.

Throughout the present disclosure, a “module” may refer to a designatedcircuit, a software application, set of computer instructions/softwareoperational on a processor, or a processor itself (e.g., ASIC), forcarrying out noted functionality.

In some implementations, the processor 130 can be a general purposeprocessor, a Field Programmable Gate Array (FPGA), an ApplicationSpecific Integrated Circuit (ASIC), a Digital Signal Processor (DSP),and/or the like. The memory 134 can be a hardware module and/orcomponent configured to store data accessible by the processor 130,and/or to store code representing executable instructions for theprocessor 130. The memory 134 can be, for example, a random accessmemory (RAM), a memory buffer, a hard drive, a database, an erasableprogrammable read-only memory (EPROM), an electrically erasableread-only memory (EEPROM), a read-only memory (ROM) and/or so forth. Insome embodiments, the memory 134 stores instructions to cause theprocessor 130 to execute modules, processes and/or functions associatedwith the wearable device 105. The processor 130 can be configured toimplement instructions stored in the memory 134. The memory 134 can beconfigured to store processor-readable instructions that are accessibleand executable by the processor 130.

In some implementations, the communicator 132 can be an externalcommunication channel device, including but not limited to a device forcommunicating on WiFi and/or cellular networks, through Bluetooth,through infrared, and/or through a similar communication modality. Thecommunicator 132 can be operatively coupled to other electronic devices,e.g., such as the computational device 107, the sensors 103, and/or thelike, and can be configured to send and/or receive data to and/or fromthe other electronic devices. In some implementations, the display 136can be one of an audio, video, haptic feedback, and/or vibrationdisplay. In some implementations, display 136 can be configured todisplay image, video, and/or other data. In some implementations, powersupply 138 can be configured to supply power to wearable device 105, forexample through a battery and/or through an external power source.Processor 130 can also control a clock 142. In some implementations, theprocessor 130 can control a number of different sensors 103, e.g.,including but not limited to a camera 144, a IMU 146 and and/or one ormore other sensors 148.

In some implementations, wearable device 105 can be an electronic devicethat is wearable and/or portable for a user, e.g., including a headsetdevice, a helmet device, a mobile device (e.g., such as a cellulartelephone, a laptop, a tablet, and/or a similar device), and/or othersuch electronic devices. As one non-limiting example, a wearable device105 can be a smartphone device operatively coupled to a head mount. Thesmartphone can include a number of sensors (e.g., such as a camera, anaccelerometer, a gyroscope, an IR sensor, and/or other sensors). Thewearable device 105 can be configured to receive sensor data from thesensors and send the sensor data to the computational device 107. Insome implementations, the computational device can be included in thewearable device 105.

Optionally sensor 103 and wearable device 105 are contained in a singlehousing (not shown). Optionally computational device 107 is alsocontained within the housing. Alternatively, computational device 107 isexternal to the housing. Also alternatively, computational device 107 isremote from the housing, such that computational device 107 is locatedat a distance of at least 5 cm, a distance of at least 10 cm, a distanceof at least 20 cm, any distance in between or a greater distance.

FIG. 1C shows a non-limiting, example, illustrative schematiccombination of a wearable device and a computational device according toat least some embodiments, shown as a system 170. For example, in someimplementations, system 170 can include computational device 107,wearable device 105, sensor preprocessor 102, SLAM analyzer 104 andapplication logic 171. In some implementations, the system 170 can alsoinclude one or more sensor(s) 103; in other implementations, the one ormore sensors may be external to the system 170, and can be operativelycoupled to system 170 so as to prove sensor data to the system 170. Theapplication logic 171 can be implemented via hardware or software, andcan be configured to support the operation, for example, of a VR and/orAR application. In some implementations, system 170 can also include adisplay 174 (e.g., similar to display 136 as described in FIG. 1B)configured to display the output of application logic 171, such asinformation related to operation of a VR or AR application. Display 174can be one or more of an audio, video, haptic feedback or vibrationdisplay.

FIG. 1D shows another non-limiting, exemplary, illustrative schematiccombination of a wearable device 105 and a computational device 107according to at least some embodiments, shown as a system 176. As shown,a system 176 can include a wearable device 105 such as a pair of smartglasses 178. Glasses 178 can include a display 180 similar to display136 described in FIG. 1B. In some implementations, the glasses 178 canbe operatively coupled to, for example, a local data processing system182 (corresponding to the sensor preprocessor 102 of computationaldevice 107), and optionally a remote processing system (according tosome embodiments). Local data processing system 182 can, in turn, beoperatively coupled to a remote data processing system 192 (e.g.,corresponding to SLAM analyzer 104 and/or a similar analytics device),for example through a network 190. Network 190 can be a wired orwireless network, and can be one of a local area network (LAN), acellular network, a wireless network (e.g., such as WiFi), a Bluetoothand/or similar network, and/or the like.

Local data processing system 182 can include, in some implementations, alocal data processing module 184 (which may be referred to as aprocessor or module and may be hardware or software), a local datastorage 186 and a local interface 188. The sensor(s) 103 can beconfigured to transmit sensor data to glasses 178, which are configuredto transmit the sensor data to local data processor 184. Local dataprocessor 184 can be configured to preprocess the sensor data. Localdata processor 184 can also be configured to store the data in localdata storage 186, and/or to transmit the data through local interface188 and network 190 to the remote data processing system 192.

When the local data processing system 182 sends preprocessed sensor datato the remote data processing system 192, the remote data interface 194of remote data processing system 192 can receive the preprocessed sensordata, and can store the preprocessed sensor data in remote data storage198. The remote data processor 196 can be configured to analyze thedata. For example, the remote data processor 196 can be configured todetermine where the glasses 178 are oriented and/or where the glasses178 have moved, using the preprocessed sensor data. In someimplementations, the remote data processor 196 can be configured todetermine other information relating to the glasses 178 based on thepreprocessed sensor data. The remote data processor can then beconfigured to send the results of the analysis of the preprocessedsensor data to local data processing system 182, e.g., via the network190. The local sensor processing system 182 can be configured to use theresults to alter information displayed by display 180 in the glasses 178(e.g., to alter an area of vision within a virtual environment, and/orthe like).

FIG. 2A shows a non-limiting, exemplary, illustrative schematic sensorpreprocessor 102 according to at least some embodiments. As shown,sensor preprocessor 102 can include a sensor abstraction interface 200,a calibration processor 202 and a sensor data preprocessor 204. Sensorabstraction interface 200 can abstract the incoming sensor data (forexample, abstract incoming sensor data from a plurality of differentsensor types), such that sensor preprocessor 102 preprocessessensor-agnostic sensor data.

In some implementations, calibration processor 202 can be configured tocalibrate the sensor input, such that the input from individual sensorsand/or from different types of sensors can be calibrated. As an exampleof the latter, if a sensor's sensor type is known and has been analyzedin advance, calibration processor 202 can be configure to provide thesensor abstraction interface 200 with information about device typecalibration (for example), so that the sensor abstraction interface 200can abstract the data correctly and in a calibrated manner. For example,the calibration processor 202 can be configured to include informationfor calibrating known makes and models of cameras, and/or the like.Calibration processor 202 can also be configured to to perform acalibration process to calibrate each individual sensor separately,e.g., at the start of a session (upon a new use, turning on the system,and the like) using that sensor. The user (not shown), for example, cantake one or more actions as part of the calibration process, includingbut not limited to displaying printed material on which a pattern ispresent. The calibration processor 202 can receive the input from thesensor(s) as part of an individual sensor calibration, such thatcalibration processor 202 can use this input data to calibrate thesensor input for each individual sensor. The calibration processor 202can then send the calibrated data from sensor abstraction interface 200to sensor data preprocessor 204, which can be configured to perform datapreprocessing on the calibrated data, including but not limited toreducing and/or eliminating noise in the calibrated data, normalizingincoming signals, and/or the like. The sensor preprocessor 102 can thensend the preprocessed sensor data to a SLAM analyzer (not shown).

FIG. 2B shows a non-limiting, example, illustrative schematic SLAManalyzer 104, according to at least some embodiments. In someimplementations, the SLAM analyzer 104 can include a localizationprocessor 206 and a mapping processor 212. The localization processor206 of the SLAM analyzer 104 can be operatively coupled to the mappingprocessor 212 and/or vice-versa. In some implementations, the mappingprocessor 212 can be configured to create and update a map of anenvironment surrounding the wearable device (not shown). Mappingprocessor 212, for example, can be configured to determine the geometryand/or appearance of the environment, e.g., based on analyzing thepreprocessed sensor data received from the sensor preprocessor 102.Mapping processor 212 can also be configured to generate a map of theenvironment based on the analysis of the preprocessed data. In someimplementations, the mapping processor 212 can be configured to send themap to the localization processor 206 to determine a location of thewearable device within the generated map.

In some implementations, the localization processor 206 can include arelocalization processor 208 and a tracking processor 210.Relocalization processor 208, in some implementations, can be invokedwhen the current location of the wearable device 105—and morespecifically, of the one or more sensors 103 associated with thewearable device 105—cannot be determined according to one or morecriteria. For example, in some implementations, relocalization processor208 can be invoked when the current location cannot be determined byprocessing the last known location with one or more adjustments. Such asituation may arise, for example, if SLAM analyzer 104 is inactive for aperiod of time and the wearable device 105 moves during this period oftime. Such a situation may also arise if tracking processor 210 cannottrack the location of wearable device on the map generated by mappingprocessor 212.

In some implementations, tracking processor 210 can determine thecurrent location of the wearable device 105 according to the last knownlocation of the device on the map and input information from one or moresensor(s), so as to track the movement of the wearable device 105.Tracking processor 210 can use algorithms such as a Kalman filter, or anextended Kalman filter, to account for the probabilistic uncertainty inthe sensor data.

In some implementations, the tracking processor 210 can track thewearable device 105 so as to reduce jitter, e.g., by keeping a constantand consistent error through the mapping process, rather than estimatingthe error at each step of the process. For example, the trackingprocessor 210 can, in some implementations, use the same or asubstantially similar error value when tracking a wearable device 105.

In some implementations, the tracking processor 210 can track thewearable device 105 so as to reduce jitter, e.g., by mixingframe-to-frame with keyframe-to-frame tracking, as described in “StableReal-Time 3D Tracking using Online and Offline Information”, byVacchetti et al. However, the method described in this paper relies uponmanually acquiring keyframes, while for the optional method describedherein, the keyframes are created dynamically as needed, as described ingreater detail below (for example as described in the discussion ofFIGS. 6-8). In some implementations, the tracking processor 210 can alsouse Kalman filtering to address jitter, can implement Kalman filteringin addition to, or in replacement of, the methods described herein.

In some implementations, the output of localization processor 206 can besent to mapping processor 212, and the output of mapping processor 212can be sent to the localization processor 206, so that the determinationby each of the location of the wearable device 105 and the map of thesurrounding environment can inform the determination of the other.

FIG. 2C shows a non-limiting, exemplary, illustrative schematic mappingmodule or processor according to at least some embodiments. For example,in some implementations, mapping module or processor 212 can include afast mapping processor 216, a map refinement processor 218, acalibration feedback processor 220, a map changes processor 222 and amap collaboration processor 224. Each of fast mapping processor 216 andmap refinement processor 218 can be in direct communication with each ofcalibration feedback processor 220 and map changes processor 222separately. In some implementations, map collaboration processor 224 maybe in direct communication with map refinement processor 218.

In some implementations, fast mapping processor 216 can be configured todefine a map rapidly and in a coarse-grained or rough manner, using thepreprocessed sensor data. Map refinement processor 218 can be configuredto refine this rough map to create a more defined map. Map refinementprocessor 218 can be configured to correct for drift. Drift can occur asthe calculated map gradually begins to differ from the true map, due tomeasurement and sensor errors for example. For example, such drift cancause a circle to not appear to be closed, even if movement of thesensor should have led to its closure. Map refinement processor 218 canbe configured to correct for drift, by making certain that the map isaccurate; and/or can be configured to spread the error evenly throughoutthe map, so that drift does not become apparent. In someimplementations, each of fast mapping processor 216 and map refinementprocessor 218 is operated as a separate thread on a computational device(not shown). For such an implementation, localization processor 206 canbe configured to operate as yet another thread on such a device.

Map refinement processor 218 performs mathematical minimization of thepoints on the map, including with regard to the position of all camerasand all three dimensional points. For example, and without limitation,if the sensor data comprises image data, then map refinement processor218 may re-extract important features of the image data around locationsthat are defined as being important, for example because they areinformation-rich. Such information-rich locations may be definedaccording to landmarks on the map, as described in greater detail below.Other information-rich locations may be defined according to their usein the previous coarse-grained mapping by fast mapping processor 216.

The combination of the implementations of FIGS. 2B and 2C may optionallybe implemented on three separate threads as follows. The tracking threadwould optionally and preferably operate with the fastest processingspeed, followed by the fast mapping thread; while the map refinementthread can operate at a relatively slower processing speed. For example,tracking can be operated at a process speed that is at least five timesfaster than the process speed of fast mapping, while the map refinementthread can be operated at a process speed that is at least 50% slowerthan the speed of fast mapping. The following processing speeds could beimplemented as a non-limiting example: tracking being operated in atracking thread at 60 Hz, fast mapping thread at 10 Hz, and the maprefinement thread being operated once every 3 seconds.

Calibration feedback processor 220 can be operated in conjunction withinput from one or both of fast mapping processor 216 and map refinementprocessor 218. For example, the output from map refinement processor 218can be used to determine one or more calibration parameters for one ormore sensors, and/or to adjust such one or more calibration parameters.For the former case, if the sensor was a camera, then output from maprefinement processor 218 can be used to determine one or more cameracalibration parameters, even if no previous calibration was known orperformed. Such output can be used to solve for lens distortion andfocal length, because the output from map refinement processor 218 canbe configured to indicate where calibration issues related to the camerawere occurring, as part of solving the problem of minimization bydetermining a difference between the map before refinement and the mapafter refinement. Alternatively or additionally, such calibration canfeed into the mapping process, whether by fast mapping processor 216and/or map refinement processor 218.

Map changes processor 222 can also be operated in conjunction with inputfrom one or both of fast mapping processor 216 and map refinementprocessor 218, to determine what change(s) have occurred in the map as aresult of a change in position of the wearable device. Map changesprocessor 222 can also receive output from fast mapping processor 216,to determine any coarse-grained changes in position. Map changesprocessor 222 can also (additionally or alternatively) receive outputfrom map refinement processor 218, to determine more precise changes inthe map. Such changes can include removal of a previous validatedlandmark, or the addition of a new validated landmark; as well aschanges in the relative location of previously validated landmarks. By“validated landmark” it is meant a landmark whose location has beencorrectly determined and confirmed, for example by being found at thesame location for more than one mapping cycle.

Such changes can be explicitly used to increase the speed and/oraccuracy of further localization and/or mapping activities, and/or canbe fed to an outside application that relies upon SLAM in order toincrease the speed and/or efficacy of operation of the outsideapplication. By “outside application” it is meant any application thatis not operative for performing SLAM.

As a non-limiting example of feeding this information to the outsideapplication, such information can be used by the application, forexample to warn the user that one of the following has occurred: aparticular object has been moved; a particular object has disappearedfrom its last known location; or a new specific object has appeared.Such warning can be determined according to the available informationfrom the last time the scene was mapped.

Map changes processor 222 can have a higher level understanding fordetermining that a set of coordinated or connected landmarks moved ordisappeared, for example to determine a larger overall change in theenvironment being mapped. Again, such information may be explicitly usedto increase the speed and/or accuracy of further localization and/ormapping activities, and/or may be fed to an outside application thatrelies upon SLAM in order to increase the speed and/or efficacy ofoperation of the outside application.

Map collaboration processor 224 can receive input from map refinementprocessor 218 in order for a plurality of SLAM analyzers in conjunctionwith a plurality of wearable devices to create a combined, collaborativemap. For example, a plurality of users, wearing a plurality of wearabledevices implementing such a map collaboration processor 224, can receivethe benefit of pooled mapping information over a larger area. As anon-limiting example only, such a larger area can include an urban area,including at least outdoor areas, and also including public indoorspaces. Such a collaborative process can increase the speed andefficiency with which such a map is built, and can also increase theaccuracy of the map, by receiving input from a plurality of differentsensors from different wearable devices. While map collaborationprocessor 224 can also receive and implement map information from fastmapping processor 216, for greater accuracy, data from map refinementprocessor 218 is used.

Optionally, computational device 107 from FIG. 1A comprises a hardwareprocessor configured to perform a predefined set of basic operations inresponse to receiving a corresponding basic instruction selected from apredefined native instruction set of codes, and memory. SLAM analyzer104 optionally comprises a first set of machine codes selected from thenative instruction set for receiving sensor data, which may be opticalsensor data. SLAM analyzer 104 optionally comprises a second set ofmachine codes selected from the native instruction set for operating alocalization module (such as the instructions for localization processor206), a third set of machine codes selected from the native instructionset for operating a fast mapping module (such as the instructions forfast mapping processor 216); and a fourth set of machine codes selectedfrom the native instruction set for operating a map refinement module(such as the instructions for map refinement processor 218). Each of thefirst, second, third and fourth sets of machine code is stored in thememory of computational device 107.

FIG. 3A shows a schematic of another non-limiting example systemaccording to at least some embodiments of the present invention,relating to one or more sensors communicating with a computationaldevice, shown as a system 300. As shown, system 300 includes acomputational device 302 in communication with one or more sensors 318.Sensor(s) 318 may comprise any type of sensor as described in thepresent disclosure, or a plurality of different types of sensors.

Computational device 302 preferably operates a sensor preprocessor 316,which may optionally operate as previously described for other sensorpreprocessors. Preferably, sensor preprocessor 316 receives input datafrom one or more sensors 318 and processes the input data to a formwhich is suitable for use by SLAM analyzer 314. SLAM analyzer 314 mayoperate as previously described for other SLAM analyzers.

SLAM analyzer 314 preferably comprises a mapping module or processor304, which may operate as previously described for other mappingmodules, and thus, perform mapping functions as previously described.SLAM analyzer 314 also preferably includes a relocalization module orprocessor 310 and a tracking module or processor 312. While in someembodiments relocalization module 310 and tracking module 312 can beseparate modules, relocalization module 310 and tracking module 312 maybe combined in a single module.

Relocalization module 310 may operate as previously describedrelocalization modules in the disclosure, so as to determine thelocation of system 300 (or rather of sensor(s) 318) in case such alocation cannot be determined from a previously known location of sameand data from sensor(s) 318. Furthermore, tracking module 312 mayoptionally operate as previously described for other tracking modules,to determine the location of system 300 (or rather of sensor(s) 318)from a previously known location of same and data from sensor(s) 318.

FIG. 3B shows a schematic of a non-limiting example of a computationaldevice operating at least some components of the system according to atleast some embodiments of the present disclosure. System 302 includessome of the same components as FIG. 3A (which are shown with the samenumbering). SLAM analyzer 314 of system 302 preferably features anobstacle avoidance module or processor 320, which is optionally andpreferably operated/controlled by mapping module or processor 304.Obstacle avoidance module 320 is configured to detect and map potentialobstacles in a real, physical world, so as to assist the user of thewearable device 105 in avoiding potential obstacles. By trackingvalidated (i.e., actual) landmarks and corresponding geometry thereof inthe real, physical world, mapping processor 212 can provide suchinformation to obstacle avoidance processor, enabling the obstacleavoidance processor to identify such landmarks as potential obstacles.The obstacle avoidance processor can thus be used to determine thedistance of the landmarks to the user and/or a distance from the user tosensor(s) 103 that are providing the input data used for mapping.

In some implementations, the output of SLAM analyzer 104 (which mayinclude information about the potential obstacles) is passed through anapplication interface to a VR (virtual reality) application. Optionally,both the application interface 322 and VR application 324 are operatedby computational device 107 (e.g., for either or both of the schematicsshown in FIGS. 3A and 3B). The VR application can use the mapping andlocalization information to feed into the map of the virtual world, aswell as the location of the representation of the user, or “avatar”, onsuch map. In addition, the VR application 324 can use informationregarding potential obstacles as input to the map of the virtual world.For example, the VR application 324 can display a wall in the virtualworld that corresponds to the location and geometry of a wall in thephysical world, according to the information received. VR application324 could also optionally receive other types of information, forexample, regarding the location and movement of an object held in theuser's hand (not shown), which would be extraneous to SLAM analyzer 314.

FIG. 3C shows a schematic of another non-limiting example of acomputational device operating at least some components of the systemaccording to at least some embodiments of the present invention, shownas a system 330. As shown, system 328 includes some of the samecomponents as FIGS. 3A and 3B, which are shown with the same numbering.System 330 preferably further includes, as part of SLAM analyzer 314, areal object locator 328. Real object locator 328 is optionally andpreferably separate from obstacle avoidance module 320, in order toprovide more detailed information about a specific object. In someembodiments, real object locator 328 provides such precise informationso as to provide a more realistic analysis of the geometry andappearance of objects which may be required for operation with an AR(augmented reality) application 326. Because augmented reality mixes thedisplay of real and rendered virtual objects, interactions between thereal and virtual objects are configured to be as realistic as possible,particularly if the user interacting with real and virtual objectsthrough the AR world provided. Real object locator 328 preferablyprovides sufficient information for interactions between the ARcomponents and the real object—again, so as to be as realistic aspossible. For example, light “shone” on a rendered virtual object shouldbe similar to the light that would be shone, or is shone, on a realobject in that position according to the light in the room. Incorrectlighting conditions for virtual objects results in less realism for theuser of the AR application (i.e., reduces the realism of interactionsbetween real and virtual objects).

In some embodiments, SLAM is not necessarily required to determine thecharacteristics of real world objects, if the characteristics of thatobject are known in advance. For example, for a mechanic using ARapplication 326 to assist in repair of a car engine, the mechanic couldpresumably reference the specification for that car engine, which maythen be used by AR application 326 to reproduce the basic object.However, any changes from the standard basic car engine, such as thepresence of damage or of an extraneous object, or the absence of anexpected object, may be handled by real object locator 328.

Additionally, mapping module 304 may feature a map refinement module (aspreviously described but not shown). Such a map refinement module isparticularly preferred for AR type applications because drift in the mapcan be more rapidly apparent with such applications, due to the mix ofvirtual and real world objects.

FIG. 4 shows a non-limiting exemplary method for performing SLAMaccording to at least some embodiments of the present disclosure. Asshown, a user moves 402 (e.g., his head and/or other body part/body)wearing the wearable device, such that sensor data is received from oneor more sensors at 404. The sensor data received is related to suchmovement. For this non-limiting example, the wearable device is assumedto be a headset of some type that is worn on the head of the user. Theheadset is assumed to contain one or more sensors, such as a camera forexample.

At 404, it is determined whether there is a last known location of thewearable device according to previous sensor data. If not, thenrelocalization is preferably performed at 406 according to any methoddescribed herein, in which the location of the wearable device isdetermined again from sensor data. For example, if the sensor is acamera, such that the sensor data is a stream of images, relocalizationcould optionally be used to determine the location of the wearabledevice from the stream of images, optionally without using the lastknown location of the wearable device as an input. Relocalization inthis non-limiting example is optionally performed according to theRANSAC algorithm, described for example in “Random sample consensus: aparadigm for model fitting with applications to image analysis andautomated cartography” by Fischler and Bolles (available fromhttp://dl.acm.org/citation.cfm?id=358692). For this algorithm, asdescribed in greater detail below, the images are decomposed to aplurality of features. The features are considered in groups of somepredetermined number, to determine which features are accurate. TheRANSAC algorithm is robust in this example because no predeterminedlocation information is required.

In 408, once the general location of the wearable device is known, thentracking is performed. Tracking is used to ascertain the currentlocation of the wearable device from general location information, suchas the last known location of the wearable device in relation to themap, and the sensor data. For example, if the sensor data is a stream ofimages, then tracking is optionally used to determine the relativechange in location of the wearable device on the map from the analyzedstream of images, relative to the last known location on the map.Tracking in this non-limiting example may optionally be performedaccording to non-linear minimization with a robust estimator, in whichcase the last known location on the map may optionally be used for theestimator. Alternatively, tracking may optionally be performed accordingto the RANSAC algorithm or a combination of the RANSAC algorithm andnon-linear minimization with a robust estimator.

After tracking is completed for the current set of sensor data, theprocess preferably returns at 402 for the next set of sensor data, aswell as continuing at 410. Preferably, as described herein, the trackingloop part of the process (repetition of 402-408) operates at 60 Hz (butother frequencies are within the scope of the present disclosure).

At 410, coarse grained, fast mapping is preferably performed aspreviously described. If the sensor data is a stream of images, thenpreferably selected images (or “keyframes”) are determined as part ofthe mapping process. During the mapping process each frame (the currentframe or an older one) may optionally be kept as a keyframe. Not allframes are kept as keyframes, as this would slow down the process.Instead, a new keyframe is preferably selected from frames showing apoorly mapped or unmapped part of the environment. One way to determinethat a keyframe shows a poorly mapped or unmapped part of theenvironment is when many new features appear (features for whichcorrespondences do not exist in the map). Another way is to computegeometrically the path of the camera. When the camera moves so that theview field partially leaves the known map, preferably a new keyframe isselected.

Optionally and preferably, 408 and 410 are performed together, inparallel, or at least receive each other's output as each stage isperformed. The impact of mapping and tracking on each other is importantfor the “simultaneous” aspect of SLAM to occur.

At 412, the map may be refined, to increase the precision of the mappingprocess, which may be performed according to bundle adjustment, in whichthe coordinates of a group or “bundle” of three dimensional points issimultaneously refined and optimized according to one or more criteria(see for example the approaches described in B. Triggs; P. McLauchlan;R. Hartley; A. Fitzgibbon (1999). “Bundle Adjustment—A ModernSynthesis”. ICCV '99: Proceedings of the International Workshop onVision Algorithms. Springer-Verlag. pp. 298-372). Such a refined map ispreferably passed back to the relocalization, tracking and fast mappingprocesses.

FIG. 5 shows a non-limiting example of a method for performinglocalization according to at least some embodiments of the presentdisclosure. It is worth noting that the method shown in FIG. 5 may beperformed for initial localization, when SLAM is first performed, and/orfor relocalization. While, the method may be performed for tracking (asdescribed herein), such may be too computationally expensive and/orslow, depending upon the computational device being used. For example,the method shown in FIG. 5, in some embodiments, may operate too slow orrequire computational resources which are not presently available oncurrent smartphones.

With respect to FIGS. 5-7, and for the purpose of illustration only(without intending to be limiting), the SLAM method is assumed to beperformed on sensor data which includes a plurality of images from acamera. Accordingly, at 502, a plurality of images, such as a pluralityof video frames, is obtained, which may optionally be preprocessed (asdescribed herein), such that the video data is suitable for furtheranalysis. At 504, one or more image feature descriptors are determinedfor each feature point in each frame. A feature point may be determinedaccording to information provided by that feature, such that aninformation-rich portion of the image may optionally be determined to bea feature. Determination of whether a portion of the image isinformation-rich may optionally be determined according to thedissimilarity of that portion of the image from the remainder of theimage. For example, and without limitation, a coin on an otherwise emptywhite surface would be considered to be the information-rich part of theimage.

Other non-limiting examples of information-rich portions of an imageinclude boundaries between otherwise homogenous objects. As used herein,the term “feature point” may optionally relate to any type of imagefeature, including a point, an edge and so forth.

As part of this process, a plurality of feature points in the frames aresearched. Optionally, such searching is performed using the FASTanalytical algorithm, as described for example in “Faster and better: amachine learning approach to corner detection”, by Rosten et al, 2008(available from https://arxiv.org/pdf/0810.2434). The FAST algorithmoptionally uses the newly selected keyframe(s) to compare the featurepoints in that keyframe to the other, optionally neighboring, keyframes,by triangulation for example.

For each feature point, a descriptor, which is a numericalrepresentation of the appearance of the surrounding portion of the imagearound the feature point, may be calculated, with an expectation thattwo different views of the same feature point will lead to two similardescriptors. In some embodiments, the descriptor may optionally becalculated according to the ORB standard algorithm, for example asdescribed in “ORB: an efficient alternative to SIFT or SURF” (availablefrom http://www.willowgarage.com/sites/default/files/orb_final.pdf); andin “ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo andRGB-D Cameras” by Mur-Artal and Tardos, 2016 (available fromhttps://arxiv.org/abs/1610.06475).

Next, an updated map is received at 506, which features a plurality oflandmarks (which as previously described, are preferably validatedlandmarks). At 508, the descriptors of at least some features in atleast some frames are compared to the landmarks of the map. Thelandmarks of the map are preferably determined according to keyframes,which may optionally be selected as previously described. To avoidrequiring comparison of all features to all landmarks, descriptorsand/or images may be sorted, for example, according to a hash function,into groupings representing similarity, such that only those descriptorsand/or images that are likely to be similar (according to the hashfunction) are compared.

In such embodiments, each feature point may include a descriptor, whichis a 32-byte string (for example). Given the map contains a plurality oflandmarks, comparing each descriptor to all landmarks, as noted above,requires a great deal of computational processing and resources.Accordingly, a vocabulary tree may be used to group descriptorsaccording to similarity: similar descriptors may be assigned the samelabel or visual word. Accordingly, for each keyframe in the map, alllabels associated with that key frame may be considered (each labelbeing related to a feature point on that map). For each label or visualmap, in some embodiments, a list of key frames containing that label maybe made. Then, for a new frame, the visual word may be computed. Next, alist of keyframes in which similar visual words appear is reviewed, withthe subject keyframes being a set of candidates for matching to oneand/or another. The vocabulary tree therefore enables more efficientassignment of the visual words, which, in turn, enables sets ofcandidate keyframes for matching to be more efficiently selected. Thesecandidates may then be used more precisely to relocalize. Non-limitingexamples of implementations of such a method are described in “Bags ofBinary Words for Fast Place Recognition in Image Sequences” (byGálvez-López and Tardós, IEEE Transactions on Robotics, 2012, availablefrom http://ieeexplore.ieee.org/document/6202705/) and “ScalableRecognition with a Vocabulary Tree” (by Stewenius and Nister, 2006,available from http://dl.acm.org/citation.cfm?id=1153548) One of skillin the art will appreciate that this method may also be used fortracking, for example, a specific object, or alternatively, for trackinggenerally as described herein.

At 510, outlier correspondences may be eliminated, for example,according to statistical likelihood of the features and the landmarksbeing correlated, and a pose (position and orientation) is calculated,preferably simultaneously. Optionally a method such as RANSAC may beimplemented to eliminate such outliers and to determine a current pose,with such methods performing both functions simultaneously. The pose ofthe sensor reporting the data may be calculated according to thecorrespondences between the features on the map and the landmarks thatwere located with the sensor data. RANSAC may optionally be implementedaccording to OpenCV, which is an open source computer vision library(available athttp://docs.opencv.org/master/d9/d0c/group_calib3d.html#gsc.tab=0).

FIG. 6 shows another non-limiting exemplary method for performinglocalization according to at least some embodiments of the presentdisclosure. The method shown, according to some embodiments, iscomputationally faster and less expensive than the method of FIG. 5.Furthermore, the method of FIG. 6 is computationally suitable foroperation on current smartphones. Optionally, the method describedherein may be used for tracking, where the previous known location ofthe sensor providing the sensor data is sufficiently well known toenable a displacement estimate to be calculated, as described in greaterdetail below.

At 602, a keyframe is selected from a set of keyframes in the map(optionally a plurality of keyframes is selected). The selection of thekeyframe may optionally be performed either around FAST feature points(as determined by the previously described FAST algorithm) or aroundreprojection locations of map landmarks with respect to the features onthe keyframe(s). This provides a relative location of the features inthe keyframe(s) with their appearance according to the pixel data. Forexample, a set of landmarks that are expected to be seen in eachkeyframe is used to determine the features to be examined.

At 604, a displacement estimate on the map may be determined, which isan estimate of the current location of the sensor providing the sensordata, which (as in earlier examples) may be a camera providing aplurality of images, according to the previous known position. Forexample, assumptions can be made of either no motion, or, of constantvelocity (estimate; assuming a constant rate of motion). In anotherexample, performed with an IMU, sensor data may be provided in terms ofrotation (and optionally other factors), which could be used todetermine a displacement estimate.

At 606, one or more patches of the keyframe(s) is warped according tothe displacement estimate around each feature of the keyframe(s).Warping may optionally be performed according to homography, exemplarymethods for which are described in greater detail below. Accordingly,the number of features may have a greater effect on computationalresources than the number of keyframes, as the number of patchesultimately determines the resources required. According to someembodiments, the displacement estimate includes an estimation oftranslocation distance and also of rotation, such that the keyframe(s)is adjusted accordingly.

At 608, the NCC (normalized cross-correlation) of the warped keyframesis preferably performed. The displacement estimate may then be adjustedaccording to the output of the NCC process at 610. Such an adjustedestimate may yield a location, or alternatively, may result in the needto perform relocalization, depending upon the reliability of theadjusted displacement estimate. The NCC output may also be used todetermine reliability of the adjusted estimate.

FIG. 7 shows a non-limiting exemplary method for updating system mapsaccording to map refinement, according to at least some embodiments. At702, the refined map is received, which can be refined according tobundle adjustment as previously described. At 704, the refined map isused to update the map at the relocalization and tracking processors,and therefore forms the new base map for the fast mapping process. At706, the map is then updated by one or more selected keyframe(s) forexample by the fast mapping process.

FIG. 8 shows a non-limiting, exemplary illustrative method forvalidating landmarks according to at least some embodiments. Forexample, at 802, a selected keyframe is applied to the currentlyavailable map in order to perform tracking. At 804, one or morevalidated landmarks are located on the map according to the appliedkeyframe. At 806, it is determined whether a validated landmark can belocated on the map after application of the keyframe. At 810, if thelandmark cannot be located, then it is no longer validated. In someimplementations, failing to locate a validated landmark once may notcause the landmark to be invalidated; rather, the landmark may beinvalidated when a statistical threshold is exceeded, indicating thatthe validated landmark was failed to be located according to asufficient number and/or percentage of times. According to thisthreshold, the validated landmark may no longer be considered to bevalidated. At 808, if the landmark is located, then the landmark isconsidered to be a validated landmark.

FIG. 9A shows an illustrative, exemplary, non-limiting method forapplying VR to medical therapeutics according to at least someembodiments of the present disclosure, for assisting an amputee toovercome phantom limb syndrome. In this non-limiting example, theamputee is referred to as the user. As shown, at stage 1 the body of theuser or a portion thereof, such as the torso and/or a particular limb,may be scanned. Such scanning may be performed in order to create a morerealistic avatar for the user to view in the VR environment, so that,for example, when the user “looks down” in the VR environment, he/shecan see body parts that realistically appear to “belong” to the user'sown body.

In some implementations, at stage 2, a familiar environment for the useris scanned which may be performed in order to create a more realisticversion of the environment for the user in the VR environment. The usermay then look around the VR environment and see virtual objects thatcorrespond in appearance to real objects with which the user isfamiliar.

In some implementations, the user enters the VR environment at stage 3,for example, by having a wearable such as a headset with a screen on thehead of the user. The wearable may be constructed as described herein,with one or more sensors to provide data such that movement could bedetected, and such that SLAM may optionally be performed as describedherein.

In some implementations, at stage 4, the user “views” the phantomlimb—that is, the limb that was amputated—as still being attached to thebody of the user. For example, if the amputated limb was the user's leftarm, then the user sees their left arm as still attached to their bodyas a functional limb, within the VR environment. In order to enable theamputated limb to be activated as used, the user's functioning right armmay be used to create a “mirror” left arm. In this example, when theuser moved his or her right arm, the mirrored left arm appears to moveand could be viewed as moving in the VR environment. In someembodiments, SLAM may (and preferably) be used to analyze the sensordata, and to correctly locate the parts of the user's body that werevisible, as well as to correctly locate the position of the user's bodyin the VR environment.

If a familiar environment for the user was scanned previously, then theVR environment can be rendered to appear to be that familiarenvironment. Creating the familiar environment can lead to powerfultherapeutic effects for the user, for example as described below inregard to reducing phantom limb pain.

In some implementations, at stage 5, the ability to view the phantomlimb may be (and preferably be) incorporated into one or moretherapeutic activities, such as the “Simon says” activity described withregard to FIG. 9B, in which the user is asked to mimic the activities ofa viewed second player in the VR environment.

In some embodiments, this method may be used to reduce phantom limbpain, in which an amputee feels strong pain that is associated with themissing limb. While such pain has been successfully treated with mirrortherapy, in which the amputee views the non-amputated limb in a mirror(see for example, Kim and Kim, “Mirror Therapy for Phantom Limb Pain”,Korean J Pain. 2012 October; 25(4): 272-274), the VR environmentdescribed herein can provide a more realistic and powerful way for theuser to view and manipulate the non-amputated limb, and hence to reducephantom limb pain.

FIG. 9B shows another illustrative, exemplary, non-limiting method forapplying VR to medical therapeutics according to at least someembodiments of the present disclosure; specifically, for providing atherapeutic environment to a subject who has suffered a stroke (e.g., asa non-limiting example of a brain injury). In this non-limiting example,the subject is encouraged to play a game of “Simon says” in order totreat hemispatial neglect, although of course other treatment methodsmay be employed instead. In the game of “Simon says”, one player (whichin this example could be a VR avatar) performs an action which the otherplayers must copy—but only if the “Simon” player says “Simon says to(perform the action)”. Of course, this requirement may be dropped (forthis non-limiting example), which is described only in terms of viewingand copying actions by the user.

Stages 1-3 of FIG. 9A may be performed for this method as well. In someimplementations, only stage 3 may be performed, so that the user entersthe VR environment. In stage 4, the user can view an avatar, which isoptionally another player (such as a therapist) or alternatively is anon-player character (NPC) generated by the VR system. Preferably, theuser perceives the avatar as standing in front of him or her, and asfacing the user. The user has his or her own avatar, which representsthose parts of the user's body that are normally visible to the useraccording to the position of the user's head and body. This avatar isreferred to in this non-limiting example as the user's avatar.

In stage 5, the avatar initiates an action, which the user is to mimicwith the user's own body. In stage 6, the user can copy—or at leastattempts to copy—the action of the avatar. The user can see the avatar,as well as those parts of the user's avatar that are expected to bevisible according to the position of the user's head and body.

Optionally, for stages 5 and 6, the user's avatar can also be placed infront of the user, for example next to the “Simon” avatar. The user canthen see both the Simon avatar, whose visual action(s) the user copies,and how the user's body is actually performing those actions with theuser's avatar.

In stage 7, if the user fails to correctly copy the action of the Simonavatar, that avatar preferably repeats the action. This process mayoptionally continue for a predetermined period of rounds or until theuser achieves at least one therapeutic goal.

In stage 8, the ability of the user to perform such actions isoptionally scored.

FIGS. 10A and 10B illustrate two examples of non-limiting methods forapplying AR to medical therapeutics according to at least someembodiments of the present disclosure.

FIG. 10A shows an illustrative, exemplary, non-limiting method forapplying AR to medical therapeutics disclosure, for assisting an amputeeto overcome phantom limb syndrome. In this non-limiting example, theamputee is referred to as the user. Stages 1 and 2 may be identical tostages 1 and 2 of FIG. 9A. However, stage 2 may only be used to scan oneor more real world objects that are familiar to the user, rather thanthe entire environment.

In stage 3, the user enters the AR environment, for example by having awearable such as a headset with a screen wearable may be constructed asdescribed herein, with one or more sensors to provide data such thatmovement could be detected, and such that SLAM could optionally beperformed as described herein.

In stage 4, the user “views” the prosthesis (although alternatively theuser could view the phantom limb and perform similar activities in theAR environment as for the VR environment described above).

In this example, when the user moved his or her prosthesis, theprosthesis appears to move and could be viewed as moving in the ARenvironment. SLAM may be and preferably is used to analyze the sensordata, and to correctly locate the parts of the user's body that werevisible, as well as to correctly locate the position of the user's bodyin the AR environment.

In stage 5, the user performs an activity in the AR environment with theprosthesis, for example to grasp and manipulate an overlaid virtualobject, or to perform a “Simon says” type of therapeutic activity, or acombination thereof.

Optionally and preferably, the methods of FIGS. 9A and 10A can be usedsequentially, to both help the amputee overcome phantom limb pain andalso to help increase the ability of the amputee to use his or herprosthesis. The methods may also be used in repeated cycles.

FIG. 10B shows another illustrative, exemplary, non-limiting method forapplying AR to medical therapeutics disclosure, for providing atherapeutic environment to a subject who has suffered a stroke/braininjury. In this non-limiting example, the subject is encouraged to playthe game of “Simon says” in order to treat hemispatial neglect. For thisexample, the “Simon” of the game may be a real person whose actions theuser could view; alternatively the “Simon” may be an avatar, generatedby the AR system and overlaid onto the viewed real physical environment.

Stages 1-3 of FIG. 9B may optionally be performed for this method aswell, and optionally, only stage 3 of FIG. 9B is performed, so that theuser enters the AR environment.

At stage 4, the user views the “Simon” of the game, which is optionallyanother player (such as a therapist) or alternatively is a non-playercharacter (NPC) generated by the AR system. Preferably the userperceives the Simon as standing in front of him or her, and as facingthe user. The user is preferably able to see his/her own body partsthrough the headset, or alternatively may optionally view an avatar asdescribed above.

At stage 5, the Simon initiates an action, which the user is to mimicwith the user's own body. The action of the Simon may optionally includegrasping a virtual object overlaid over the real physical environmentfor example, although optionally any action may be performed by theSimon. In stage 6, the user copies—or at least attempts to copy—theaction of the Simon. The user can see the Simon, as well as those partsof the user's body that are expected to be visible according to theposition of the user's head and body.

Optionally, for stage 6, a representation of the user's body, as anavatar, may also be placed in front of the user, for example next to theSimon. The user could then see both the Simon, whose visual action(s)the user copies, and how the user's body is actually performing thoseactions with the user's avatar.

At stage 7, if the user fails to correctly copy the action of the Simon,the Simon preferably repeats the action. This process may optionallycontinue for a predetermined period of rounds or until the user achievesat least one therapeutic goal.

In stage 8, the ability of the user to perform such actions may bescored.

FIG. 11 shows an exemplary, non-limiting flow diagram for performingSLAM according to at least some embodiments. As shown, a SLAM process1100 begins with inputs from at least one sensor, shown in thisnon-limiting example as a camera 1102 and an IMU (inertial measurementunit) 1104. Camera 1102 is preferably an optical camera which mayoptionally be monocular.

Data from camera 1102 and IMU 1104 are then passed to a mapinitialization checking process 1106 which determines whether the maphas been initialized. If not, the data is passed to a map initializationprocess 1108. Otherwise, the data is passed to a tracking process 1110.A tracking checking process 1112 determines whether tracking has beenlost. If tracking has been lost, then the data is sent to arelocalization process 1114, and then back to tracking process 1110.

Tracking process 1110 then preferably performs tracking as follows. Thepose is predicted, optionally by integrating IMU data between theprevious time that the pose was known and the predicted time. Next alocal map is optionally selected. For example, such a local map mayoptionally be constructed dynamically, based on the predicted pose, bymaking a list of potentially visible map points.

The local map is then tracked, for example with NCC (Normalized CrossCorrelation), LK (Lucas-Kanade), or a combination thereof. For NCC, apatch is warped around a keyframe feature and compared to the currentframe, in the area that the selected feature is expected to be present.LK involves tracking a feature from the previous frame to the nextframe, which reduces or eliminates jittering.

Next, the image data (frame) is analyzed to determine whether it is akeyframe candidate by a keyframe analyzer process 1116. If so, then theframe is passed to a mapping process 1118, which begins with a keyframecreation process 1120. Once the keyframe has been created, a process1122 adds the keyframe to the map. The keyframe is added by insertingthe keyframe into the graph and updating the neighbor pointers. Once thekeyframe has been added, a map update process is run in 1124.

After the map has been updated, a loop closure process 1126 isoptionally run. As shown, loop closure process 1126 starts with a loopdetection process 1128, to see whether a loop has been detected. Next,if a loop is detected in 1130, a loop optimization process 1132 isperformed.

Optionally the process of FIG. 11 is implemented as follows, with regardto the three modules of tracking, mapping and loop closing. Each of themodules is optionally run in a separate thread. Communication betweenthem is optionally performed using messages. This means that idle threadwaits for a message to process it. If there is no message waiting, itfalls asleep.

FIGS. 12A-12C show a detailed, exemplary, non-limiting flow diagram forperforming SLAM according to at least some embodiments. FIG. 12A showsthe overall diagram of a system 1200, while FIGS. 12B and 12C show twoportions of the overall diagram. The below explanation is provided withregard to these latter two diagrams for the sake of clarity. Numbersthat are identical to those of FIG. 11 have the same or similarfunction.

FIG. 12B shows the top portion of FIG. 12A. As shown, map initialization1108 features a keypoints reference frame detection process 1202 toselect the 1st frame as reference frame. A process 1204 includesdetecting the points on the reference frame, for example by using the LKprocess as previously described.

Next, in process 1206, NCC is optionally used to verify the validity ofthe tracked points.

The NCC verified matches are optionally passed to an essential matrixRANSAC process or a homography RANSAC process to calculate the pose, inprocess 1208. An initial map is created in a process 1210.

As previously described, if tracking is lost, then relocalizationprocess 1114 is optionally performed. A process 1212 detects featuresand computes descriptors. Next a process 1214 queries the inverted indexfor candidates. A process 1216 verifies geometry of the candidates.

If the map has been initialized and tracking has not been lost, thentracking process 1110 is performed, including pose prediction 1218.Next, a local map is determined in a process 1220. Local map tracking isthen performed, optionally with a process 1222 that features NCC and LK.Then it is determined whether a frame is optionally a new keyframe in aprocess 1224. In some preferred embodiments, the fast mapping processordiscussed above can be used to create new map points while trackingprocess 1110 is performed. In some cases, tracking process 1110 candetect that using the same or a subset of the same sensor data. Forexample, the optical sensor can move to an unknown or lesser-knownportion of the map and the fast mapping processor can determine featuredescriptors and use them as map points for tracking. During the mappingprocess described below, the new feature descriptors can serve as inputfor updating the map. In this way, both tracking and mapping cancontinue with less latency for the tracking. This same type oflow-latency mapping can be performed during relocalization process 1114as well. In some embodiments, this low-latency mapping does not includebundle adjustment to ensure lower latency. The flow then continues inFIG. 12C, if a new keyframe candidate is detected.

FIG. 12C shows the bottom portion of FIG. 12A. As shown in keyframecreation 1120, optionally various processes are performed to increasethe efficacy of the method. These processes include finding the positionwithin a keyframe using tracking based on normalized cross-correlation(NCC) between pixel patches. When the camera starts to observe thekeypoints coming from outside of the map, a new set of keypoints needsto be computed (detected and described). Once this is done, therelationship is established between the newly computed keypoints andthose that were tracked using NCC. To that end, the track trajectory isused to link the newly detected keypoints with those detected & trackedbefore.

In order to speed up this process, optionally a simplified version of aquadtree is used, for example and without limitation a SearchGrid2d.This is a structure to find the nearest neighbors in the image space inan efficient way through grid search confined to a search area in theneighborhood.

Once a full list of descriptors (the tracked ones and the newly detectedones) has been established, one can proceed to build a new keyframe fromthe whole set of descriptors, as shown in a process 1226.

Next, preferably a process 1228 to use NCC to correct LK matches isperformed. LK matches are prone to drift. Performing a NCC search is notcomputationally expensive and can be used to refine the match.

Next a process 1230 to detect features and compute descriptors isperformed. Then a process 1232 to fuse features is performed, preferablyincluding removing duplicates, for example by using a quad-tree orsimilar.

Process 1122 checks that process 1120 succeeds in adding a new keyframe,after which the new keyframe proceeds to map update 1124.

As shown with regard to map update 1124, there is provided a new mappoints creation module 1240, which uses the new descriptors to add tothe existing map. A local BA (bundle adjustment) process 1242 isperformed. BA process 1242 relates to adjustment of the bundles of lightrays originating from each 3D feature and converging on each camera'soptical center. Adjustment is preferably performed with regard tostructural and viewing parameters. Optionally one or more localkeyframes are removed if they are redundant or otherwise not necessaryin a process 1244.

Optionally, a process may be applied to determine whether there issufficient parallax between two keyframes, to determine whether bothkeyframes are to be kept, for example as part of initialization 1200(before process 1202) or as part of process 1240. Optionally differentprocesses for determining parallax may be performed as part ofinitialization 1200 or process 1240.

FIG. 12D shows an exemplary, non-limiting flow diagram for calculatingparallax sufficiency according to at least some embodiments. The processis described with regard to a plurality of points and may optionallyalso be used to determine sufficient parallax for two images, bycomparing a large enough number of points to determine whether theimages have enough parallax to be useful.

As shown in a process 1250, the process begins with receiving two imagesand a map with three dimensional points in stage 1252. A point from thefirst image is projected to the 3D map in stage 1254. The first imagepoint is then transformed with the essential matrix, to locate an imagepoint on the second image, in stage 1256. The essential matrix may becalculated as described in greater detail below. The second image pointis then projected onto the 3D map in stage 1258. Two rays are thendetermined, a first ray from the first image point onto itscorresponding point on the 3D map, and a second ray from the secondimage point onto its corresponding point on the 3D map. These rays aretriangulated in stage 1260.

In stage 1262, it is determined whether there is sufficient parallaxbetween the rays. Sufficient parallax means that there is a sufficientlylarge angle between the rays, determined according to a threshold. Thethreshold may be, for example, from 1 to 5 degrees (absolute value).Factors that affect sufficiency of parallax include but are not limitedto the camera lens focal length and pixel density.

If there is not sufficient parallax, then in stage 1264A, at least oneimage point of the pair of points is rejected. If there is sufficientparallax, then in stage 1264B, the image points are accepted.

Turning back to FIG. 12, and specifically to FIG. 12C, loop closureprocess 1126, within loop optimization 1132, features computing asimilarity transformation capturing the loop closing error in a process1246. Preferably this similarity transformation is a SIM3transformation. Sim3 is a similarity transform in 3D: 3 DoF fortranslation, 3DoF for rotation, and 1 DoF for scale. With loop closure,there are two solutions to locate the current camera position withrespect to the map. These two solutions should be a single one, but, dueto drift, they will diverge. Loop closing brings them back together.Next loop fusion 1248 is performed as previously described. Optionally,essential graph optimization is performed in a process 1250, if thekeyframes are organized within a graph with pointers, to reduce thecomputational resources required.

FIG. 13A shows a schematic graph of accelerometer data. For any rotationaround the z-axis, the gravity vector keeps aligned with the axis,therefore providing no extra information. On top of that, in thepresence of accelerations different from the gravity, the anglemeasurement cannot be achieved by only using accelerometer since themeasure acceleration will no longer be 1 g. Therefore, another source ofinformation is required to find the exact orientation of theaccelerometer to be able to remove gravitational source of accelerationfrom the component due to the accelerometer's movement. In order toobtain the acceleration purely due to the movement, the accelerometerreading should be rotated to the global frame of reading where we cansee the effect of gravity.

The gyroscope is a sensor which measures the angular velocity of thebody to which it is attached (by using the Coriolis Effect). It ispossible to determine rotation matrix from a single integration ofgyroscope's signal. Nonetheless, this integration introduces error inthe orientation due to the existence of time variant bias on gyroscope'ssignal.

A magnetometer is a device capable of measuring the magnetic fieldsacross each one of the axes of the device. In the absence of any majorelectromagnetic interference, the magnetic field detected by this sensoris the one coming from the earth magnetic field, which makes themagnetometer read the heading angle with respect to the magnetic northas a global reference of orientation; An important aspect about using amagnetometer, however, is its vulnerability in the presence ofadditional sources of electromagnetic sources, which can distortsignificantly the sensor's reading.

So, to improve the orientation estimation, an approach is to fuse theorientation calculated from the gyroscope with tilt estimation fromaccelerometer and azimuth estimation from magnetometer using an optimalestimator such as Kalman filter. The position can be obtained by adouble integration of the acceleration in the global frame ofnavigation. However, drift occurs very quickly with (double) integrationof accelerometer signals (seconds) and relatively quickly with (single)integration of gyros (minutes).

Although the IMU is prone to drift and to issues regarding the initialcalibration, it does have a number of strengths that can counterbalanceweaknesses of optical SLAM. For example and without limitation, the highfrequency of operation (400 Hz for example), operates without regard toexternal illumination conditions and provides reliable tracking inshort-timespan.

Some optional uses for integrating the IMU data include finding the mapscale and the gravity axis in the SLAM coordinate system (necessary touse accelerometer data) and dead reckoning via IMU when visual SLAM maynot be accurate.

Map scale may optionally be recovered as follows. The SLAM systemprovides 3D position p_s(t) and orientation R_s(t) as functions of timet. From that, it is possible to compute the accelerations of the cameraa_s(t) by numerically deriving twice p_s with respect to t.

Assuming the IMU device and the camera sensor are placed at the samepoint, the SLAM data is related to the measured acceleration byaccelerometer, a_i(t), as follows:

a _(i)(t)=R _(s)(t)*(s*a _(s)(t)+g)

where g is the gravity vector expressed in m/s² in the SLAM worldcoordinate system, and s the map scale.

By recording SLAM and IMU data during a correctly tracked motion thatcontains acceleration, it is possible to recover g and s.

It is possible to estimate position with IMU (dead reckoning) asfollows. Assume that visual SLAM tracking is accurate until time t,after which it ceases to be accurate. It is necessary to estimateposition at t+d.

Rotation estimation is estimated at the last known position: A(t)=R(t)

Then one can recursively integrate rotation:

A(t+dt)=A(t)·expm(G(t)*dt)

A(t+dt):=A(t)·expm(G(t)*dt)

where G(t) is a skew-symmetric matrix of gyro readings and dt issampling period.

Then it is possible to initialize position and velocity estimates:

e(t)=p _(s)(t)

v(t)=(p _(s)(t)−p _(s)(t−dt))/dt

The following can then be updated:

v(t+dt)=v(t)+a _(i)(t+dt)*dt

p(t+dt)=p(t)+v(t)*dt+0.5*a _(i)(t+dt)×dt ²

v(t+dt):=v(t)+a_i(t+dt)*dt

p(t+dt):=p(t)+v(t)*dt+0.5*a_i(t+dt)*dt{circumflex over ( )}2

FIG. 13B shows an exemplary, non-limiting flow diagram for determiningthe coordinates scale and gravity vector from IMU data according to atleast some embodiments.

Monocular SLAM can only reconstruct the geometry of a scene up to ascale factor. The unit of the SLAM coordinate system is arbitrarydepending on how SLAM selects the unit when constructing the initialmap. Since the IMU data provides device acceleration readings in metricunits, it is possible to recover the scale of the SLAM coordinate systemby comparing acceleration readings and accelerations computed from theSLAM camera trajectory. Since accelerometers also sense earthgravitation, the gravity vector in the SLAM coordinate system can alsobe evaluated.

As shown in a method 1300, the process starts with obtaining SLAM basedcoordinates for the current location of the apparatus (device), in stage1302. These coordinates determine a device position Pt and deviceorientation Rt in the SLAM coordinate system. Next the IMU data isobtained in stage 1304, which provides accelerations aIMU with gravitycomponent gIMU in IMU coordinate system in metric units. In the SLAMcoordinate system, the linear acceleration of the device measured by theIMU is:

$\begin{matrix}{a_{t} = {R_{t} \cdot \left( {a_{IMU} - g_{IMU}} \right)}} \\{= {{R_{t} \cdot a_{IMU}} - {R_{t} \cdot g_{IMU}}}} \\{= {{R_{t} \cdot a_{IMU}} - g}}\end{matrix}\quad$

in which, g is the gravity vector which is fixed in the SLAM coordinatesystem.

Let s be the scaling factor of the SLAM unit compared to the metricunit.

The device linear acceleration in SLAM unit is:

$\begin{matrix}{a_{t} = {s \cdot \left( {{R_{t} \cdot a_{IMU}} - g} \right)}} \\{= {{s \cdot R_{t} \cdot a_{IMU}} - g^{*}}}\end{matrix}\quad$

in which g*=s·g in the fixed gravity vector in SLAM coordinate system.Let v(t) be the velocity of the device in SLAM coordinate system. It canbe computed from finite differences of visual SLAM trajectory in stage1306. It is then possible to write the position of the device at time tas:

$\begin{matrix}{P_{t} = {P_{t_{0}} + {\int_{t_{0}}^{t}{{v\left( t^{\prime} \right)}{dt}^{\prime}}}}} \\{= {P_{t_{0}} + {v_{t_{0}}\left( {t - t_{0}} \right)} - {\frac{1}{2}{g^{*}\left( {t - t_{0}} \right)}^{2}} + {s \cdot {C(t)}}}}\end{matrix}\quad$

in which

C(t)=∫_(t) ₀ ^(i)[∫_(t) ₀ ^(t′) R(t″)·a(t″)dt″]dt′

which can be computed numerically by double integration. The term C(t)relates to integration of the gyroscope data.

For the above equation, R(t+dt)=R(t)*exp(G(t)*dt), where G(t) is askew-symmetric matrix of gyroscope readings and dt is sampling period.

For each t, there exists a linear equation with scale and gravity asvariables.

${\left\lbrack {{{- \frac{1}{2}}{\left( {t - t_{0}} \right)^{2} \cdot I_{3 \times 3}}}{C(t)}} \right\rbrack \cdot \begin{bmatrix}g^{*} \\s\end{bmatrix}} = {P_{t} - P_{t_{0}} - {v_{t_{0}}\left( {t - t_{0}} \right)}}$

which can be solved according to the above as a series of linearequations according to least squares. Solving the equations for enough(t0, t) periods in the least square sense leads to the determination ofthe gravity vector and the scale which relates the SLAM coordinates tothe metric measurements of the IMU.

The reference coordinate frame of slam GSLAM is different from that ofIM, represented by GIMU. If the matrix that rotates data from frame A toframe B is shown by R_(A) ^(B), then to align the frame of SLAM withthat of IMU:

R _(GIMU) ^(GSLAM) =R _(cam) ^(GSLAM) ×R _(IMU) ^(cam) ×R _(GIMU) ^(IMU)

Where R_(cam) ^(GSLAM) is the result of visual SLAM process and R_(GIMU)^(IMU) is output of orientation tracking using IMU. Therefore, the twoconstant calibration matrices R_(IMU) ^(cam) and R_(GIMU) ^(GSLAM),could be obtained during a constrained optimization process at stage1305.

FIG. 13C shows an exemplary, non-limiting flow diagram for poseprediction according to at least some embodiments. Pose prediction isdescribed for example with regard to process 1218 of FIGS. 12A-12C. Asshown in a process 1350, the process begins with obtaining the gravityvector and SLAM coordinates scale in stage 1352, for example asdescribed with regard to FIG. 13B. In stage 1354, the position andvelocity of the device at the last successfully tracked position areprovided. In stage 1356, accelerometer and gyroscope data are combinedwith the position and velocity of the device. Accelerometer data iscombined through a double integration while accelerometer data iscombined through integration to the desired time for prediction t. Instage 1358, pose prediction is performed according to the followingequation, for determining Pt:

P _(t) =P _(t) ₀ +v _(t) ₀ (t−t ₀)−1/2g*(t−t ₀)² +s·C(t)

FIG. 14 shows a schematic block diagram of an exemplary, non-limitingsystem for visual-inertial SLAM with IMU (inertial measurement unit)data according to at least some embodiments. In some implementations,SLAM system 1400 can include at least one computational device/computer1407 (as indicated earlier, the terms/phrases of computer, processor andcomputation device can be used interchangeably in the presentdisclosure), a wearable device 1405, one or more optical sensors 1403,one or more IMU 1420 and optionally one or more other sensor(s) 1422.Optionally, at least one optical sensor 1403 and at least one IMU 1420can be combined in a single device (not shown).

The computational device 1407 can include a sensor preprocessor 1402 anda SLAM analyzer 1404, and can be operatively coupled to the wearabledevice 1405 (e.g., wired or wirelessly), and can be included in thewearable device 1405, and/or some combination thereof. Sensorpreprocessor 1402 and SLAM analyzer 1404 can be separate processors inand of themselves in the computational device, or, may be softwaremodules (e.g., an application program and/or a set of computerinstructions for performing SLAM functionality operational on one ormore processors). In some implementations, the computational device 1407can be configured to receive signal data (e.g., from the wearable device1405), to preprocess the signal data, so as to determine movement of thewearable device 1405, and to instruct the wearable device 1405 toperform one or more actions based on the movement of the wearable device1405. Specifically, in some implementations, sensor preprocessor 1402can receive the optical sensor data and the IMU data from the wearabledevice 1405, and can perform preprocessing on the data. For example,sensor preprocessor 1402 can generate abstracted optical sensor and IMUdata based on the optical sensor and IMU sensor data.

SLAM analyzer 1404 is configured to operate a SLAM process so as todetermine a location of wearable device 1405 within a computationaldevice-generated map, as well as being configured to determine a map ofthe environment surrounding wearable device 1405. For example, the SLAMprocess can be used to translate movement of the user's head and/or bodywhen wearing the wearable device (e.g., on the user's head or body). Awearable that is worn on the user's head would for example providemovement information with regard to turning the head from side to side,or up and down, and/or moving the body in a variety of different ways.The wearable may be attached to a robot or other moving object. Suchmovement information is needed for SLAM to be performed.

In some implementations, because the preprocessed sensor data isabstracted from the specific sensors as described above, the SLAManalyzer 1404, therefore, may be sensor-agnostic, and may performvarious actions without knowledge of the particular sensors from whichthe sensor data was derived.

As a non-limiting example, if optical sensor 1403 is a camera (e.g.,digital camera including a resolution, for example, of 640×480 andgreater, at any frame rate including, for example 60 fps), then movementinformation may be determined by SLAM analyzer 104 according to aplurality of images from the camera. For such an example, sensorpreprocessor 1402 preprocesses the images before SLAM analyzer 1404performed the analysis (which may include, for example, convertingimages to grayscale). Next a Gaussian pyramid may be computed for one ormore images, which is also known as a MIPMAP (multum in parvo map), inwhich the pyramid starts with a full resolution image, and the image isoperated on multiple times, such that each time, the image is half thesize and half the resolution of the previous operation. SLAM analyzer1404 may perform a wide variety of different variations on the SLAMprocess, including one or more of, but not limited to, PTAM (ParallelTracking and Mapping), as described for example in “Parallel Trackingand Mapping on a Camera Phone” by Klein and Murray, 2009 (available fromieeexplore.ieee.org/document/5336495/); DSO (Direct Sparse Odometry), asdescribed for example in “Direct Sparse Odometry” by Engel et al, 2016(available from https://arxiv.org/abs/1607.02565); or any other suitableSLAM method, including those as described herein.

In some implementations, the wearable device 1405 can be operativelycoupled to the one or more optical sensor(s) 1403, the one or more IMU1420 and the one or more other sensor(s) 1422 to the computationaldevice 1407 (e.g., wired, wirelessly). The wearable device 1405 can be adevice (such as an augmented reality (AR) and/or virtual reality (VR)headset, and/or the like) configured to receive sensor data, so as totrack a user's movement when the user is wearing the wearable device1405. The wearable device 1405 can be configured to send sensor datafrom the one or more optical sensor(s) 1403, the one or more IMU 1420and the one or more other sensor(s) 1422 to the computational device1407, such that the computational device 1407 can process the sensordata to identify and/or contextualize the detected user movement.

In some implementations, any or a combination of the one or more opticalsensors 1403, the one or more IMU 1420 and the one or more othersensor(s) 1422 can be included in wearable device 1405 and/or separatefrom wearable device 1405.

Optical sensor 1403 can be a camera, for example as one or more of anRGB, color, grayscale or infrared camera, a charged coupled device(CCD), a CMOS sensor, a depth sensor, and/or the like. Other sensor(s)1422 may include one or more of an accelerometer, a gyroscope, amagnetometer, a barometric pressure sensor, a GPS (global positioningsystem) sensor, a microphone or other audio sensor, a proximity sensor,a temperature sensor, a UV (ultraviolet light) sensor, and/or othersensors. IMU 1420 can be an accelerometer, a gyroscope, a magnetometer,a combination of two or more of same, and/or the like. IMU 1420preferably comprises an accelerometer and a gyroscope, and optionallyand preferably further comprises a magnetometer.

As described in greater detail below, the IMU data and optical data ispreferably combined by the SLAM process performed by SLAM analyzer 1404.Various methods are known in the art for such a combination, but thecombining process is time-based, as the IMU data provides measurementswith regard to time.

FIG. 15A shows an exemplary, non-limiting flow diagram for SLAMinitialization according to at least some embodiments. As shown, amethod 1500 begins with obtaining a reference frame F1 in stage 1502.Next, features are tracked in an incoming frame Fi in stage 1504. Thefeatures are analyzed to determine whether a current frame Fi can beused as a second reference frame, optionally in two stages as shown. Instage 1506, homography relating features on F1 and Fi are computed. Instage 1508, essential matrix relating features on F1 and Fi arecomputed. Stages 1506 and 1508 may be performed in any order or inparallel. Homography provides better results for planar or twodimensional scenes (such as a wall for example) than an essentialmatrix. An essential matrix is operative for any three dimensionalscene.

Homography is a mapping of two images of the same planar surface.Homography is also applicable to two projection planes having the samecenter of projection. The previously described RANSAC algorithm canestimate homography and determine inliers at the same time. Homographymay be used to determine inliers by determining whether two pointsrepresent the same feature on two images, by transforming a point on oneimage to locate the point on a second image with homography. Also theinverse is performed to take a point on the second image to locate it onthe first image. The distances between the various pairs of points arethen summed; if the difference is too great, then at least one point isan outlier. A similar method may be used for application of an essentialmatrix. These methods may be performed according to the ORB SLAM paper(Mur-Artai et al, “ORB-SLAM: A Versatile and Accurate Monocular SLAMSystem”, IEEE Transactions on Robotics (Volume: 31, Issue: 5, Oct.2015), pages 1147-1163.

An essential matrix is applicable to stereo scenes. In athree-dimensional scene, corresponding points lie on conjugate epipolarlines. Given a point in one image, multiplying by the essential matrixwill determine which epipolar line to search along in the second view.The essential matrix can be estimated through various algorithms,including without limitation GOODSAC and RANSAC (Michaelsen et al,“ESTIMATING THE ESSENTIAL MATRIX: GOODSAC VERSUS RANSAC”,Photogrammetric Computer Vision (2006), pp. 1-6).

Optionally, homography is computed first and if the result issufficiently robust, stage 1508 (computation with the essential matrix)is skipped. Alternatively, both stages 1506 and 1508 are applied, andthen the best result is selected for subsequent stages. The process thendetermines whether there are enough inliers of points in the image, asdetermined with regard to the map points, in stage 1510. Thedetermination of whether there are sufficient inliers relates to athreshold, which may optionally be set heuristically. It is thendetermined whether the distance is within a threshold distance. Thethreshold distance may be determined heuristically and may for examplebe up to 10 pixels, up to 5 pixels, up to 1 pixel or any distance inbetween. If the distance is within the threshold, then the first imagepoint is included as an inlier. Otherwise it is rejected as an outlier.

If there aren't sufficient inliers, the process preferably returns tostage 1504 or may return to stage 1502, for reasons described in greaterdetail below (1511). If there are sufficient inliers, the processpreferably continues to stage 1512 to determine whether there are enoughinliers with enough parallax. A detailed description of a non-limiting,exemplary method for determining whether there is enough parallax isprovided with regard to FIG. 12D.

If stage 1510 fails, the process continues to stage 1511. If stage 1512fails, the process continues to stage 1514A. Each of stage 1511 or stage1514A returns the process to stage 1504; if more than a threshold numberof failures have occurred, stage 1514A returns the process to stage1502.

In stage 1514B, initial pose and map computation is performed. If theinitial pose and map are successfully determined, the process continuesto stage 1516. Otherwise, the process continues to stage 1514C, whichreturns the process to stage 1504; if more than a threshold number offailures have occurred, stage 1514C returns the process to stage 1502.

In stage 1516, an initial bundle adjustment is performed. According tosome embodiments, the initial bundle adjustment is performed with twoframes F1 and Fi, and a 3D map point, as described for example withregard to FIG. 4.

According to some embodiments, the initial bundle adjustment isperformed with optical and IMU data according to a time-based method,such as a spline camera trajectory, for example. A detailed descriptionof a non-limiting example of such a method is provided with regard toFIG. 15B. Briefly, the method operates by determining a relativeposition of the camera in both space and time, with 6DOF (degrees offreedom). The motion of the camera is parameterized and then preferablyinterpolated with a spline, for example by using quaternioninterpolation. It is possible to calculate six unknowns for the 6DOF oralternatively, to analyze data points that are captured relativelyclosely together, so that only the differences between the parametersneed to be determined. The spline also assists in reducing the number ofunknowns to be calculated, as it can be used to interpolate the databetween control points of the spline. A non-limiting example of a methodfor performing such a spline-based parameterization is described byLovegrove et al (Spline Fusion: A continuous-time representation forvisual-inertial fusion with application to rolling shutter cameras,Proc. BMVC, 2013).

Preferably, for better results, projected coordinates of matches alongthe spline are determined for each frame, to determine whether thetracking is operating sufficiently well such that predicted matches havesimilar coordinates to actual matches. The accelerometer data is assumedto match the camera acceleration, as the two sensors are part of thesame hardware unit and/or are otherwise physically connected to betraveling at the same speed. Similarly, the gyroscope data is assumed tomatch the camera rotation speed. These assumptions reduce the complexityof calculations.

Also if a spline camera trajectory method is used, the coordinates ofthe features from frame Fi in stage 1504 are preferably stored in RAM.

The same or a similar process may be used in initialization of the map,such as for example process 1210 of FIGS. 12A-12C, or bundle adjustmentsuch as for example process 1242 of FIGS. 12A-12C. When used forinitialization of the map, a basic set of control points for the splineis preferably available, whether from a previous mapping process or fromanother source.

FIG. 15B shows an exemplary, non-limiting flow diagram for calculating aspline camera trajectory according to at least some embodiments. Aprocess 1550 begins with receiving tracked trajectories of features,initial geometry and IMU data points, in stage 1552. The optical data ispreferably parameterized as described above, for example by usinghomography and/or the essential matrix. The IMU data preferably includesat least gyroscope and accelerometer data. A plurality of control pointsis determined, for example according to a plurality of time points. Thetime points may be determined according to elapsed time for example.Next the spline control points are initialized according to a linearinterpolation of the initial geometry, in stage 1554. The data analysisseeks to minimize errors when determining the location (position andorientation) of the point of the spline in three dimensional space. Eachcontrol point represents a control moment and features a key frame, butnot all key frames are necessarily associated with control moments.

The data analysis then preferably proceeds with a loop, in which theerror is minimized. The loop includes evaluating the objective functionin stage 1556 and then refining the control points and the map points,in stage 1558, for example according to an algorithm such asLevenberg-Marquardt or Gauss-Newton. The process then loops back tostage 1556 for minimizing the error. These two stages are preferablyrepeated until a minimum error is achieved. Minimizing error preferablyincludes minimizing the reprojection error (of the map points along thefeature trajectories), the gyroscope error and the accelerometer error.

The spline may also be used for pose prediction; once the spline hasbeen determined as described above, and data from the inertial trackingsystem is received (preferably including gyroscope and accelerometerdata), it is possible to predict the pose at the next time pointaccording to a combination of the spline and the data.

In some embodiments, a differential equation solution can be used inplace of a spline-based approach as described above for calculating atrajectory. In some embodiments, a combination of the two types ofsolutions can be used.

FIG. 16 shows an exemplary, non-limiting flow diagram for SLAMinitialization with interpolation of inertial tracking data according toat least some embodiments. As shown, a method 1600 begins with obtaininga reference frame F1 in stage 1602. Next, visual tracking is optionallyreset if there is no motion (that is, no change from a previous image)in stage 1604. Features are tracked in an incoming frame Fi in stage1606. The features are analyzed to determine whether a current frame Fican be used as a second reference frame, optionally in a plurality ofstages as shown.

In stage 1608A, homography is computed while in stage 1608B, anessential matrix is computed. These stages may optionally be performedas previously described.

Next initialization errors for inertial tracking and visual tracking arecalculated in stage 1610.

A weighted SLERP (spherical linear interpolation) of the IMU rotationand the rotation as calculated from homography (stage 1608A) and/oressential matrix (stage 1608B) is then performed in stage 1612. Theweight for this interpolation depends upon the initialization error.This interpolation may be used to correct any errors occurring fromanalysis of the optical data. A SLERP is a method which allows tointerpolate two rotations represented as quaternions and may beperformed according to any suitable method.

A non-limiting example of a method to calculate SLERP may be found inthe SOPHUS library, found in the GitHub repository as follows:https://github.com/strasdat/Sophus. This library uses Lie groups toparameterize a rotation according to three values so that a minimizercan handle it efficiently, Certain Lie groups represent transformationsin two dimensional and three dimensional space, including with regard tocases involving transformations of rotations in three dimensional space,such as those represented by the above rotations. They can be appliedaccording to the associated Lie algebra, to calculate a SLERPinterpolation.

Another non-limiting example of a method to calculate SLERP may be foundin the Mobile Robot Programming Toolkit(https://www.mrpt.org/tutorials/programming/maths-and-geometry/slerp-interpolation/).

In stage 1614, the results determined from translation of inertialtracking are optionally applied. However, the translation of inertialtracking may not be accurate to use as a measurement because of arelatively long time window required during initialization. Optionally,translation from inertial tracking is used to validate resultscalculated by visual tracking if initialization is corrupted.Alternatively stage 1614 is not performed.

The process then determines whether there are enough inliers of pointsin the image, as determined with regard to the map points, and/orparallax in stage 1616. This determination is performed by computationsusing homography and/or the essential matrix, optionally afterapplication of the results of stages 1610 and 1612, or stages 1610-1614.Non-limiting, exemplary methods for performing such determinations werepreviously described.

If there are not sufficient inliers and/or parallax, the processcontinues to stage 1618A, which returns the process to stage 1604. Ifmore than a certain number of failures have occurred or the number ofinliers has dropped below a defined threshold, stage 1618A returns theprocess to stage 1602.

If there are sufficient inliers and/or parallax, then in stage 1618B, aninitial bundle adjustment is performed with two frames F1 and Fi, and a3D map point, as described for example with regard to FIG. 4.

FIG. 17A shows an exemplary, non-limiting flow diagram for determining akey moment according to at least some embodiments. An explanation of keymoments is provided with regard to FIG. 17B below.

As shown in FIG. 17A, a process 1700 begins with providing a tracker instage 1702, such as for example tracking module 1110. The trackermaintains a buffer, shown as tracking buffer 1730, which is preferablycontinuously refreshed as a circular buffer with tracking data. Thetracking data includes but is not limited to one or more of map pointID, image frame timestamp and the 2D coordinates at which the map pointhas been observed.

A separate IMU buffer 1732 is preferably continuously refreshed as acircular buffer with IMU data. The IMU data includes but is not limitedto one or more of a timestamp, gyroscope data, accelerometer data andoptionally magnetometer data.

Next, in stage 1704, after tracking initialization or recovery, thelast_key_moment parameter is initialized to be equal to theframe_timestamp parameter (that is, the timestamp of the last trackedframe). For each incoming frame in stages 1706 and 1708, the frame istracked in stage 1706. According to the results of tracking in stage1706, tracking buffer 1730 is then updated.

Next a keyframe decision is made as to whether the frame is to beselected as a keyframe, for example as described in the ORB SLAM paperor as described with regard to FIG. 12 above, in stage 1708. If thedecision is to accept the frame as a keyframe, then a key moment isdefined as described with regard to stage 1710. The key moment isdefined as starting at: MAX(current_frame_timestamp−k, last_key_moment)and ending at current_frame_timestamp (inclusive). The constant “k” isdetermined empirically, typically in seconds, such as for example 1second or any other suitable value. In stage 1712, data from thetracking buffer 1730 and the IMU buffer 1732 between starting and endingtime are copied to the Key Moment.

Next in stage 1714, the last_key_moment is set to be parameter equal tothe current_frame_timestamp. In stage 1716, the key moment is sent tothe mapping module. Optionally, stages 1714 and 1716 are performed inparallel or in either order. The process then returns to stage 1706 forthe next frame.

FIG. 17B shows an exemplary, non-limiting schematic diagram of a splinewith a plurality of key moments and key frames. As shown, diagram 1750features a spline 1752 for tracking motion of the optical sensor througha plurality of map points 1758. Spline 1752 is determined according to aplurality of key frames 1754 and key moments 1756. As noted above, in avisual SLAM system, the problem of computational resources is handled byselecting a sparse subset of the frames, called key frames. For avisual-inertial SLAM system, the complete time window from when thecamera acquisition started to the present time is not considered.Instead, only a sparse partition of this time is considered, keepingonly “key moments”.

Preferably key moments 1756 cover the geometry of spline 1752.

Key moments 1756 also preferably cover when the device is accelerated,to capture scale. Key moments preferably cover “links” betweenpreviously mapped areas and newly discovered areas, so that IMU datarelates new map points to old ones. Static moments (with no movement)are preferably discarded. Dynamic moments over an already mapped areamay be discarded if scale of this area has been estimated reliablyalready.

Any and all references to publications or other documents, including butnot limited to, patents, patent applications, articles, webpages, books,etc., presented in the present application, are herein incorporated byreference in their entirety.

Example embodiments of the devices, systems and methods have beendescribed herein. As noted elsewhere, these embodiments have beendescribed for illustrative purposes only and are not limiting. Otherembodiments are possible and are covered by the disclosure, which willbe apparent from the teachings contained herein. Thus, the breadth andscope of the disclosure should not be limited by any of theabove-described embodiments but should be defined only in accordancewith claims supported by the present disclosure and their equivalents.Moreover, embodiments of the subject disclosure may include methods,systems and apparatuses which may further include any and all elementsfrom any other disclosed methods, systems, and apparatuses, includingany and all elements corresponding to target particle separation,focusing/concentration. In other words, elements from one or anotherdisclosed embodiments may be interchangeable with elements from otherdisclosed embodiments. In addition, one or more features/elements ofdisclosed embodiments may be removed and still result in patentablesubject matter (and thus, resulting in yet more embodiments of thesubject disclosure). Correspondingly, some embodiments of the presentdisclosure may be patentably distinct from one and/or another referenceby specifically lacking one or more elements/features. In other words,claims to certain embodiments may contain negative limitation tospecifically exclude one or more elements/features resulting inembodiments which are patentably distinct from the prior art whichinclude such features/elements.

What is claimed is:
 1. An apparatus, comprising: a wearable device; anoptical sensor coupled to the wearable device; a computational device; asimultaneous localization and mapping (SLAM) analyzer configured tooperate on the computational device and to receive optical sensor datafrom said optical sensor and having a localization processor and a fastmapping processor, the fast mapping processor configured to rapidlycreate a map from said optical sensor data; and a map refinementprocessor to refine said map according to said optical sensor data;wherein said localization processor is configured to localize theoptical sensor according to said optical sensor data within said mapaccording to a SLAM process; and wherein at least two of saidlocalization processor, said fast mapping processor, and said maprefinement processor is configured to operate at a separate processspeed of said computational device.
 2. The apparatus of claim 1, whereinsaid computational device comprises a mobile computational device. 3.The apparatus of claim 2, wherein said computational device comprises acellular phone.
 4. The apparatus of claim 3, wherein said wearabledevice comprises headgear for mounting said wearable device to a user,said cellular phone comprises said optical sensor, and said cellularphone is mounted on said headgear or is otherwise connected to it. 5.The apparatus of claim 1, wherein said SLAM analyzer is configured tooperate said map refinement processor at a first process speed and tooperate said fast mapping processor at a second process speed, saidfirst process speed being substantially slower than said second processspeed.
 6. The apparatus of claim 5, wherein said first process speed isat least 50% slower than said second process speed.
 7. The apparatus ofclaim 5, wherein: said localization processor comprises a trackingprocessor and at least one of said SLAM analyzer and localizationprocessor is configured to operate said tracking processor at a thirdprocess speed, said third process speed being different from said firstprocess speed and different from said second process speed and beingsubstantially faster than said second process speed; and said trackingprocessor is configured to localize said optical sensor on said mapaccording to said optical sensor data and according to a last knownposition of said optical sensor on said map.
 8. The apparatus of claim7, wherein said third process speed is at least five times faster thansaid second process speed.
 9. The apparatus of claim 7, wherein saidtracking processor is configured to reduce jitter by spreading erroracross localizations.
 10. The apparatus of claim 7, wherein said maprefinement processor is configured to calibrate said optical sensoraccording to an estimate of difference between said map before and aftersaid map refinement processor refines said map.
 11. The apparatus ofclaim 7, wherein: said SLAM analyzer further comprises a map changesprocessor, and said map changes processor is configured to detect achange in the environment of said optical sensor represented by saidmap.
 12. The apparatus of claim 11, further comprising an outsideapplication configured to be operated by said computational device formanipulating, locating or representing an object, wherein said mapchanges processor is configured to send a message signal to said outsideapplication that: a particular object has been moved, a particularobject has disappeared from its last known location, or a new specificobject has appeared.
 13. The apparatus of claim 12, wherein said outsideapplication comprises a VR (virtual reality) application or an AR(augmented reality) application.
 14. The apparatus of claim 13, wherein:said outside application is an AR application; said SLAM analyzerfurther comprising a real object locator; and said real object locatordetermines a location and geometry of a physical object in anenvironment external to the apparatus, and provides said location andgeometry to said AR application.
 15. A method for performing SLAM for anapparatus comprising a wearable device, a sensor attached to thewearable device, a computational device, and a simultaneous localizationand mapping (SLAM) analyzer operated by the computational device, themethod comprising: receiving sensor data from said sensor by said SLAManalyzer; performing a SLAM process by said SLAM analyzer, said SLAMprocess comprising: simultaneously dynamically constructing a map andlocating the wearable device according to said sensor data within saiddynamically constructed map, wherein said SLAM process is adapted to beperformed by said limited resources of said computational device;performing a fast mapping process to rapidly create said dynamicallyconstructed map from said sensor data; performing a localization processto localize said wearable device in said dynamically constructed mapaccording to said sensor data; and performing a map refinement processto refine said dynamically constructed map according to said sensordata, wherein: said map refinement processor is operated at a firstprocess speed of said computational device and said fast mappingprocessor is operated at a second process speed of said computationaldevice, said first process speed being substantially slower than saidsecond process speed so as to adapt said SLAM process to be performed bysaid computational device.
 16. The method of claim 15 wherein said firstprocess speed is at least 50% slower than said second process speed. 17.The apparatus of claim 15, wherein: said performing a localizationprocess comprises a tracking process operated at a third process speedof the computational device, said third process speed being differentfrom said first process speed and different from said second processspeed and being substantially faster than said second process speed. 18.The method of claim 17, wherein said third process speed is at leastfive times faster than said second process speed.